Quick Answer

HADR_DBSTATECHANGE_SYNC occurs when SQL Server serializes updates to Always On availability group replica state information. This wait indicates contention during database state transitions like failover, suspend/resume operations, or role changes. Brief spikes during planned operations are normal, but sustained waits suggest configuration or infrastructure issues.

Root Cause Analysis

This wait type emerges from the Always On state machine's internal synchronization mechanisms. When replica databases transition between states (ONLINE, RECOVERING, RESTORING, SUSPECT), SQL Server must update the internal replica state metadata atomically. The HADR worker threads acquire locks on the database replica state structures to prevent race conditions during these critical transitions.

The wait manifests when multiple threads attempt to modify replica state simultaneously. Common scenarios include automatic failover events where the new primary must update all secondary replica states, manual failover operations coordinating state changes across replicas, and database suspend/resume operations that modify synchronization states.

SQL Server 2016 introduced optimizations to reduce contention duration by batching certain state updates. SQL Server 2019 further improved the state transition logic by implementing lock-free algorithms for read-heavy state queries. SQL Server 2022 added enhanced diagnostics in sys.dm_hadr_availability_replica_states that provide more granular state transition tracking.

The underlying mechanism involves the HADR Transport Manager coordinating with the Database Engine's replica state controller. When state changes occur, the system must update both local metadata and coordinate with remote replicas, creating serialization points that generate these waits.

Diagnostic Queries

-- Check current AG replica states and lag
SELECT 
    ar.replica_server_name,
    adc.database_name,
    ars.role_desc,
    ars.operational_state_desc,
    ars.synchronization_state_desc,
    ars.last_commit_time,
    DATEDIFF(ms, ars.last_commit_time, GETDATE()) AS lag_ms
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.availability_databases_cluster adc ON ar.group_id = adc.group_id
ORDER BY ars.last_commit_time DESC;

-- Identify HADR_DBSTATECHANGE_SYNC wait statistics
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_DBSTATECHANGE_SYNC'
    AND waiting_tasks_count > 0;

-- Check for active state transitions and blocking
SELECT 
    s.session_id,
    s.status,
    s.command,
    r.wait_type,
    r.wait_time,
    r.wait_resource,
    t.text
FROM sys.dm_exec_sessions s
JOIN sys.dm_exec_requests r ON s.session_id = r.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type = 'HADR_DBSTATECHANGE_SYNC'
    OR s.program_name LIKE '%Always On%';

-- Monitor AG health and identify problematic databases
SELECT 
    ag.name AS availability_group,
    drs.database_id,
    drs.group_id,
    drs.replica_id,
    drs.synchronization_state_desc,
    drs.synchronization_health_desc,
    drs.last_hardened_time,
    drs.redo_queue_size,
    drs.redo_rate
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
WHERE drs.synchronization_health_desc != 'HEALTHY'
    OR drs.redo_queue_size > 1000000;

-- Check for frequent state changes in error log
EXEC xp_readerrorlog 0, 1, 'Always On', 'state change';

Fix Scripts

Resolve stuck replica state transitions

-- Force state refresh for problematic replica
-- WARNING: Test in development first, may cause brief connectivity interruption
USE master;
GO

DECLARE @replica_name NVARCHAR(128) = 'YourReplicaServerName';

-- Suspend and resume database to reset state
ALTER AVAILABILITY GROUP [YourAGName] 
MODIFY REPLICA ON @replica_name 
WITH (SESSION_TIMEOUT = 30); -- Temporarily reduce timeout

-- Wait 10 seconds then reset to normal timeout
WAITFOR DELAY '00:00:10';

ALTER AVAILABILITY GROUP [YourAGName] 
MODIFY REPLICA ON @replica_name 
WITH (SESSION_TIMEOUT = 10); -- Reset to normal

Clear wait statistics for monitoring reset

-- Reset wait stats to establish new baseline
-- Run during maintenance window for accurate trending
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

-- Verify reset
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_DBSTATECHANGE_SYNC';

Optimize AG configuration for state change performance

-- Reduce session timeout for faster state detection
-- WARNING: Too low values may cause false failovers
USE master;
GO

ALTER AVAILABILITY GROUP [YourAGName] 
MODIFY REPLICA ON 'PrimaryReplica'
WITH (SESSION_TIMEOUT = 10, -- Seconds
      AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
      FAILOVER_MODE = AUTOMATIC);

ALTER AVAILABILITY GROUP [YourAGName] 
MODIFY REPLICA ON 'SecondaryReplica'
WITH (SESSION_TIMEOUT = 10,
      AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
      FAILOVER_MODE = MANUAL);

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

HADR_DBSTATECHANGE_SYNC Wait Type Explained

Quick Answer

Root Cause Analysis

Diagnostic Queries

Fix Scripts

Prevention

Need hands-on help?

Related Pages

HADR_AG_MUTEX

HADR_AR_CRITICAL_SECTION_ENTRY

HADR_BACKUP_BULK_LOCK

Always On AG Latency

sys.dm_hadr_availability_replica_states