Quick Answer
HADR_DBSTATECHANGE_SYNC occurs when SQL Server serializes updates to Always On availability group replica state information. This wait indicates contention during database state transitions like failover, suspend/resume operations, or role changes. Brief spikes during planned operations are normal, but sustained waits suggest configuration or infrastructure issues.
Root Cause Analysis
This wait type emerges from the Always On state machine's internal synchronization mechanisms. When replica databases transition between states (ONLINE, RECOVERING, RESTORING, SUSPECT), SQL Server must update the internal replica state metadata atomically. The HADR worker threads acquire locks on the database replica state structures to prevent race conditions during these critical transitions.
The wait manifests when multiple threads attempt to modify replica state simultaneously. Common scenarios include automatic failover events where the new primary must update all secondary replica states, manual failover operations coordinating state changes across replicas, and database suspend/resume operations that modify synchronization states.
SQL Server 2016 introduced optimizations to reduce contention duration by batching certain state updates. SQL Server 2019 further improved the state transition logic by implementing lock-free algorithms for read-heavy state queries. SQL Server 2022 added enhanced diagnostics in sys.dm_hadr_availability_replica_states that provide more granular state transition tracking.
The underlying mechanism involves the HADR Transport Manager coordinating with the Database Engine's replica state controller. When state changes occur, the system must update both local metadata and coordinate with remote replicas, creating serialization points that generate these waits.
AutoDBA checks Always On availability group health monitoring, replica state tracking, and synchronization performance metrics across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.
Diagnostic Queries
-- Check current AG replica states and lag
SELECT
ar.replica_server_name,
adc.database_name,
ars.role_desc,
ars.operational_state_desc,
ars.synchronization_state_desc,
ars.last_commit_time,
DATEDIFF(ms, ars.last_commit_time, GETDATE()) AS lag_ms
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.availability_databases_cluster adc ON ar.group_id = adc.group_id
ORDER BY ars.last_commit_time DESC;
-- Identify HADR_DBSTATECHANGE_SYNC wait statistics
SELECT
wait_type,
waiting_tasks_count,
wait_time_ms,
max_wait_time_ms,
signal_wait_time_ms,
wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type = 'HADR_DBSTATECHANGE_SYNC'
AND waiting_tasks_count > 0;
-- Check for active state transitions and blocking
SELECT
s.session_id,
s.status,
s.command,
r.wait_type,
r.wait_time,
r.wait_resource,
t.text
FROM sys.dm_exec_sessions s
JOIN sys.dm_exec_requests r ON s.session_id = r.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type = 'HADR_DBSTATECHANGE_SYNC'
OR s.program_name LIKE '%Always On%';
-- Monitor AG health and identify problematic databases
SELECT
ag.name AS availability_group,
drs.database_id,
drs.group_id,
drs.replica_id,
drs.synchronization_state_desc,
drs.synchronization_health_desc,
drs.last_hardened_time,
drs.redo_queue_size,
drs.redo_rate
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
WHERE drs.synchronization_health_desc != 'HEALTHY'
OR drs.redo_queue_size > 1000000;
-- Check for frequent state changes in error log
EXEC xp_readerrorlog 0, 1, 'Always On', 'state change';
Fix Scripts
Resolve stuck replica state transitions
-- Force state refresh for problematic replica
-- WARNING: Test in development first, may cause brief connectivity interruption
USE master;
GO
DECLARE @replica_name NVARCHAR(128) = 'YourReplicaServerName';
-- Suspend and resume database to reset state
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON @replica_name
WITH (SESSION_TIMEOUT = 30); -- Temporarily reduce timeout
-- Wait 10 seconds then reset to normal timeout
WAITFOR DELAY '00:00:10';
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON @replica_name
WITH (SESSION_TIMEOUT = 10); -- Reset to normal
Clear wait statistics for monitoring reset
-- Reset wait stats to establish new baseline
-- Run during maintenance window for accurate trending
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
-- Verify reset
SELECT
wait_type,
waiting_tasks_count,
wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type = 'HADR_DBSTATECHANGE_SYNC';
Optimize AG configuration for state change performance
-- Reduce session timeout for faster state detection
-- WARNING: Too low values may cause false failovers
USE master;
GO
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'PrimaryReplica'
WITH (SESSION_TIMEOUT = 10, -- Seconds
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = AUTOMATIC);
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryReplica'
WITH (SESSION_TIMEOUT = 10,
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
FAILOVER_MODE = MANUAL);
AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.
Prevention
Configure appropriate session timeout values based on network latency and infrastructure reliability. Values too high delay state change detection; values too low trigger false positive failovers. Start with 10 seconds and adjust based on monitoring.
Implement comprehensive monitoring of replica states using sys.dm_hadr_availability_replica_states. Alert on synchronization_state_desc changes and synchronization_health_desc degradation before they impact state transitions.
Ensure adequate network bandwidth and low latency between replicas. State synchronization performance directly correlates with network quality. Consider dedicated network paths for Always On traffic in high-transaction environments.
Schedule maintenance operations during low-activity periods to minimize state change conflicts. Coordinate failover testing and replica modifications to avoid simultaneous state transitions across multiple availability groups.
Regularly review Always On error logs for state transition patterns. Frequent state changes indicate underlying infrastructure issues requiring investigation. Implement automated log analysis to detect recurring state change problems.
Configure proper backup strategies to avoid backup-related state transitions during peak hours. Full backups on secondary replicas can trigger temporary state changes that contribute to wait accumulation.
Need hands-on help?
Dealing with persistent hadr_dbstatechange_sync issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.