Quick Answer

HADR_AR_CRITICAL_SECTION_ENTRY occurs when Always On availability group DDL operations or WSFC commands compete for exclusive access to the local replica's runtime state. This synchronization bottleneck typically indicates heavy concurrent management activity against availability groups, usually during failovers, configuration changes, or health monitoring spikes.

Root Cause Analysis

This wait type emerges from SQL Server's internal synchronization mechanism protecting the availability group runtime state stored in memory. When operations like ALTER AVAILABILITY GROUP, failover commands, or WSFC health checks execute, they must acquire exclusive access to the availability replica's critical section before modifying or reading runtime metadata.

The critical section protects structures containing replica roles, connection states, synchronization health, and failover eligibility flags. Multiple threads attempting simultaneous access create contention, with subsequent requests waiting on this specific wait type. The lock manager coordinates this access through a lightweight synchronization primitive rather than traditional SQL Server locks.

In SQL Server 2016, Microsoft improved the granularity of these critical sections, reducing wait times for read-only operations like health checks. SQL Server 2017 introduced better separation between DDL operations and monitoring queries. SQL Server 2019 added enhanced diagnostics for availability group state transitions, making these waits more observable but not necessarily more frequent.

The Always On state machine processes these requests sequentially within each replica. Heavy automation tools polling availability group health, frequent manual failovers, or applications repeatedly checking replica states amplify this contention. WSFC cluster validation operations also trigger these waits during cluster health assessments.

Diagnostic Queries

-- Current HADR critical section waits and blocking sessions
SELECT 
    r.session_id,
    r.wait_type,
    r.wait_time,
    r.last_wait_type,
    s.login_name,
    s.program_name,
    t.text as current_sql
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type = 'HADR_AR_CRITICAL_SECTION_ENTRY'
ORDER BY r.wait_time DESC;

-- Historical wait statistics for HADR critical section waits
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms / NULLIF(waiting_tasks_count, 0) as avg_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_AR_CRITICAL_SECTION_ENTRY'
AND waiting_tasks_count > 0;

-- Availability group DDL operations and their execution times
SELECT 
    ag.name as availability_group_name,
    ar.replica_server_name,
    ar.endpoint_url,
    ars.role_desc,
    ars.operational_state_desc,
    ars.connected_state_desc,
    ars.synchronization_health_desc,
    ars.last_connect_error_number,
    ars.last_connect_error_description
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
INNER JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id;

-- Active Always On health check sessions
SELECT 
    s.session_id,
    s.login_name,
    s.program_name,
    s.client_interface_name,
    r.command,
    r.status,
    r.wait_type,
    r.cpu_time,
    r.total_elapsed_time
FROM sys.dm_exec_sessions s
LEFT JOIN sys.dm_exec_requests r ON s.session_id = r.session_id
WHERE s.program_name LIKE '%Always On%' 
   OR s.program_name LIKE '%WSFC%'
   OR r.command LIKE '%HADR%'
ORDER BY s.last_request_start_time DESC;

-- Cluster and availability group event history from error log
EXEC xp_readerrorlog 0, 1, N'Always On', N'availability group';

Fix Scripts

Reduce monitoring frequency from applications and automation tools

-- Review and optimize health check queries to reduce frequency
-- Replace frequent polling with event-driven notifications where possible
-- Example: Instead of checking AG state every 5 seconds, use 30-60 second intervals

-- Identify frequent callers
SELECT 
    s.program_name,
    COUNT(*) as connection_count,
    AVG(s.cpu_time) as avg_cpu_time
FROM sys.dm_exec_sessions s
WHERE s.program_name IS NOT NULL
GROUP BY s.program_name
HAVING COUNT(*) > 10
ORDER BY connection_count DESC;

Test connection pooling and query frequency in development before implementing. Overly aggressive monitoring creates more problems than it solves.

Optimize Always On configuration for reduced contention

-- Review and adjust session timeout values if appropriate
-- Higher values reduce health check frequency but increase failover detection time
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'PrimaryReplica' 
WITH (SESSION_TIMEOUT = 15); -- Default is 10 seconds, consider 15-20 for busy systems

-- Ensure proper backup preferences to avoid unnecessary operations on secondary replicas
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'SecondaryReplica'
WITH (BACKUP_PRIORITY = 50); -- Adjust based on your backup strategy

Test failover detection times after increasing session timeout. Document changes for operations teams.

Implement proper error handling for Always On operations

-- Add retry logic with exponential backoff for availability group operations
-- Example template for DDL operations
DECLARE @retry_count INT = 0;
DECLARE @max_retries INT = 3;
DECLARE @wait_time VARCHAR(12) = '00:00:01'; -- Start with 1 second
DECLARE @wait_seconds INT = 1;

WHILE @retry_count < @max_retries
BEGIN
    BEGIN TRY
        -- Your Always On DDL operation here
        ALTER AVAILABILITY GROUP [YourAGName] FAILOVER;
        BREAK; -- Success, exit loop
    END TRY
    BEGIN CATCH
        IF ERROR_NUMBER() IN (35250, 35251, 35252) -- AG-related timeout errors
        BEGIN
            SET @retry_count = @retry_count + 1;
            IF @retry_count < @max_retries
            BEGIN
                WAITFOR DELAY @wait_time;
                SET @wait_seconds = @wait_seconds * 2; -- Exponential backoff
                SET @wait_time = CONVERT(VARCHAR(12), DATEADD(SECOND, @wait_seconds, '00:00:00'), 108);
            END
            ELSE
            BEGIN
                THROW; -- Re-throw if max retries exceeded
            END
        END
        ELSE
        BEGIN
            THROW; -- Re-throw non-timeout errors immediately
        END
    END CATCH
END;

Always test failover procedures in development. Implement proper logging for retry attempts.

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

HADR_AR_CRITICAL_SECTION_ENTRY Wait Type Explained

Quick Answer

Root Cause Analysis

Diagnostic Queries

Fix Scripts

Prevention

Need hands-on help?

Related Pages

HADR_AG_MUTEX

HADR_BACKUP_BULK_LOCK

HADR_BACKUP_QUEUE

Always On AG Latency

sys.dm_hadr_availability_replica_states