Quick Answer
HADR_AR_CRITICAL_SECTION_ENTRY occurs when Always On availability group DDL operations or WSFC commands compete for exclusive access to the local replica's runtime state. This synchronization bottleneck typically indicates heavy concurrent management activity against availability groups, usually during failovers, configuration changes, or health monitoring spikes.
Root Cause Analysis
This wait type emerges from SQL Server's internal synchronization mechanism protecting the availability group runtime state stored in memory. When operations like ALTER AVAILABILITY GROUP, failover commands, or WSFC health checks execute, they must acquire exclusive access to the availability replica's critical section before modifying or reading runtime metadata.
The critical section protects structures containing replica roles, connection states, synchronization health, and failover eligibility flags. Multiple threads attempting simultaneous access create contention, with subsequent requests waiting on this specific wait type. The lock manager coordinates this access through a lightweight synchronization primitive rather than traditional SQL Server locks.
In SQL Server 2016, Microsoft improved the granularity of these critical sections, reducing wait times for read-only operations like health checks. SQL Server 2017 introduced better separation between DDL operations and monitoring queries. SQL Server 2019 added enhanced diagnostics for availability group state transitions, making these waits more observable but not necessarily more frequent.
The Always On state machine processes these requests sequentially within each replica. Heavy automation tools polling availability group health, frequent manual failovers, or applications repeatedly checking replica states amplify this contention. WSFC cluster validation operations also trigger these waits during cluster health assessments.
AutoDBA checks Always On configuration, monitoring frequency optimization, and availability group health check patterns across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.
Diagnostic Queries
-- Current HADR critical section waits and blocking sessions
SELECT
r.session_id,
r.wait_type,
r.wait_time,
r.last_wait_type,
s.login_name,
s.program_name,
t.text as current_sql
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type = 'HADR_AR_CRITICAL_SECTION_ENTRY'
ORDER BY r.wait_time DESC;
-- Historical wait statistics for HADR critical section waits
SELECT
wait_type,
waiting_tasks_count,
wait_time_ms,
max_wait_time_ms,
signal_wait_time_ms,
wait_time_ms / NULLIF(waiting_tasks_count, 0) as avg_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type = 'HADR_AR_CRITICAL_SECTION_ENTRY'
AND waiting_tasks_count > 0;
-- Availability group DDL operations and their execution times
SELECT
ag.name as availability_group_name,
ar.replica_server_name,
ar.endpoint_url,
ars.role_desc,
ars.operational_state_desc,
ars.connected_state_desc,
ars.synchronization_health_desc,
ars.last_connect_error_number,
ars.last_connect_error_description
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
INNER JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id;
-- Active Always On health check sessions
SELECT
s.session_id,
s.login_name,
s.program_name,
s.client_interface_name,
r.command,
r.status,
r.wait_type,
r.cpu_time,
r.total_elapsed_time
FROM sys.dm_exec_sessions s
LEFT JOIN sys.dm_exec_requests r ON s.session_id = r.session_id
WHERE s.program_name LIKE '%Always On%'
OR s.program_name LIKE '%WSFC%'
OR r.command LIKE '%HADR%'
ORDER BY s.last_request_start_time DESC;
-- Cluster and availability group event history from error log
EXEC xp_readerrorlog 0, 1, N'Always On', N'availability group';
Fix Scripts
Reduce monitoring frequency from applications and automation tools
-- Review and optimize health check queries to reduce frequency
-- Replace frequent polling with event-driven notifications where possible
-- Example: Instead of checking AG state every 5 seconds, use 30-60 second intervals
-- Identify frequent callers
SELECT
s.program_name,
COUNT(*) as connection_count,
AVG(s.cpu_time) as avg_cpu_time
FROM sys.dm_exec_sessions s
WHERE s.program_name IS NOT NULL
GROUP BY s.program_name
HAVING COUNT(*) > 10
ORDER BY connection_count DESC;
Test connection pooling and query frequency in development before implementing. Overly aggressive monitoring creates more problems than it solves.
Optimize Always On configuration for reduced contention
-- Review and adjust session timeout values if appropriate
-- Higher values reduce health check frequency but increase failover detection time
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'PrimaryReplica'
WITH (SESSION_TIMEOUT = 15); -- Default is 10 seconds, consider 15-20 for busy systems
-- Ensure proper backup preferences to avoid unnecessary operations on secondary replicas
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'SecondaryReplica'
WITH (BACKUP_PRIORITY = 50); -- Adjust based on your backup strategy
Test failover detection times after increasing session timeout. Document changes for operations teams.
Implement proper error handling for Always On operations
-- Add retry logic with exponential backoff for availability group operations
-- Example template for DDL operations
DECLARE @retry_count INT = 0;
DECLARE @max_retries INT = 3;
DECLARE @wait_time VARCHAR(12) = '00:00:01'; -- Start with 1 second
DECLARE @wait_seconds INT = 1;
WHILE @retry_count < @max_retries
BEGIN
BEGIN TRY
-- Your Always On DDL operation here
ALTER AVAILABILITY GROUP [YourAGName] FAILOVER;
BREAK; -- Success, exit loop
END TRY
BEGIN CATCH
IF ERROR_NUMBER() IN (35250, 35251, 35252) -- AG-related timeout errors
BEGIN
SET @retry_count = @retry_count + 1;
IF @retry_count < @max_retries
BEGIN
WAITFOR DELAY @wait_time;
SET @wait_seconds = @wait_seconds * 2; -- Exponential backoff
SET @wait_time = CONVERT(VARCHAR(12), DATEADD(SECOND, @wait_seconds, '00:00:00'), 108);
END
ELSE
BEGIN
THROW; -- Re-throw if max retries exceeded
END
END
ELSE
BEGIN
THROW; -- Re-throw non-timeout errors immediately
END
END CATCH
END;
Always test failover procedures in development. Implement proper logging for retry attempts.
AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.
Prevention
Configure monitoring tools to query availability group health less frequently, using 30-60 second intervals instead of 5-10 seconds. Implement event-driven monitoring using Extended Events or SQL Server Agent alerts rather than continuous polling.
Consolidate Always On management operations to reduce concurrent DDL activity. Schedule maintenance windows for configuration changes and avoid overlapping automated operations like backup jobs with manual failovers.
Use dedicated service accounts for Always On operations with appropriate permissions. Avoid shared application accounts that create connection pooling contention during health checks.
Configure appropriate session timeout values based on network latency and system load. Environments with consistent sub-second response times can use lower timeouts (10-15 seconds), while geographically distributed replicas may need 20-30 seconds.
Implement proper connection pooling in applications accessing availability group listeners. Applications opening multiple concurrent connections for health checks create unnecessary critical section contention.
Monitor Extended Events for availability group state changes to identify patterns in management activity. The AlwaysOn_health session provides detailed visibility into operations causing these waits.
Need hands-on help?
Dealing with persistent hadr_ar_critical_section_entry issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.