Quick Answer
HADR_DATABASE_VERSIONING_STATE occurs when SQL Server changes the internal versioning state of an Always On availability group secondary database. This wait happens during state transitions like role changes, synchronization events, or database recovery operations and typically lasts milliseconds with minimal performance impact.
Root Cause Analysis
This wait type manifests during transitions in the internal versioning state machine that governs Always On secondary databases. The versioning state tracks the database's position in the Always On replication pipeline and determines whether the secondary can accept read operations.
SQL Server maintains several internal structures for each availability database including the availability database state, redo thread state, and log scan information. When these states change, the HADR_DATABASE_VERSIONING_STATE wait protects the critical section where SQL Server updates these structures atomically.
Common triggers include primary to secondary role switches, database synchronization state changes (synchronizing to synchronized), redo thread restarts, and automatic page repairs. The wait ensures consistency between the physical database state and the Always On metadata structures.
In SQL Server 2016 and later, Microsoft optimized the state transition logic to reduce lock contention. SQL Server 2019 introduced additional parallelization in redo operations that can increase the frequency of these waits but also reduces their individual duration. SQL Server 2022 further refined the state machine with improved snapshot isolation handling for readable secondaries.
The wait occurs on both the primary and secondary replicas but is most common on secondaries during log replay operations. Extended occurrences indicate either heavy transaction log activity overwhelming redo threads or storage latency preventing timely state updates.
AutoDBA checks Always On availability group health, redo queue monitoring, and replica synchronization performance across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.
Diagnostic Queries
-- Check current availability database states and redo queue
SELECT
ag.name AS availability_group_name,
DB_NAME(drs.database_id) AS database_name,
drs.database_state_desc,
drs.synchronization_state_desc,
drs.log_send_queue_size,
drs.redo_queue_size,
drs.redo_rate
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
WHERE drs.is_local = 1;
-- Identify wait statistics for HADR versioning waits
SELECT
wait_type,
waiting_tasks_count,
wait_time_ms,
max_wait_time_ms,
signal_wait_time_ms,
wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type = 'HADR_DATABASE_VERSIONING_STATE'
AND waiting_tasks_count > 0;
-- Check for blocked HADR operations
SELECT
s.session_id,
r.command,
r.status,
r.wait_type,
r.last_wait_type,
r.wait_resource,
db_name(r.database_id) AS database_name,
s.login_name,
r.start_time
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
WHERE r.wait_type = 'HADR_DATABASE_VERSIONING_STATE'
OR r.last_wait_type = 'HADR_DATABASE_VERSIONING_STATE';
-- Monitor redo thread performance
SELECT
ag.name AS availability_group_name,
DB_NAME(drs.database_id) AS database_name,
drs.redo_queue_size,
drs.redo_rate,
drs.low_water_mark_for_ghosts,
CASE
WHEN drs.redo_rate > 0 THEN drs.redo_queue_size / drs.redo_rate
ELSE NULL
END AS estimated_redo_completion_seconds
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
WHERE drs.is_local = 1
AND drs.redo_queue_size > 0;
-- Check for recent availability group state changes
SELECT TOP 20
object_name,
counter_name,
cntr_value,
cntr_type
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Availability Replica%'
AND counter_name IN ('Database State Changes/sec', 'Recovery Queue', 'Log Send Queue')
ORDER BY cntr_value DESC;
Fix Scripts
Restart stalled redo threads Restarts redo operations when versioning state changes are blocked by stalled redo threads.
-- Suspend and resume database to restart redo threads
-- WARNING: This briefly interrupts replication for the specified database
DECLARE @database_name NVARCHAR(128) = 'YourDatabaseName';
DECLARE @availability_group NVARCHAR(128) = 'YourAGName';
-- Suspend data movement
ALTER AVAILABILITY GROUP [YourAGName]
SUSPEND DATA_MOVEMENT FOR DATABASE [YourDatabaseName];
-- Wait 5 seconds for suspension to complete
WAITFOR DELAY '00:00:05';
-- Resume data movement
ALTER AVAILABILITY GROUP [YourAGName]
RESUME DATA_MOVEMENT FOR DATABASE [YourDatabaseName];
Force database state synchronization Forces a complete state refresh when versioning information becomes inconsistent.
-- Remove and re-add database to availability group
-- WARNING: This causes full data synchronization - test thoroughly
-- Only use when database shows persistent state inconsistencies
-- Step 1: Remove database from availability group (run on primary)
ALTER AVAILABILITY GROUP [YourAGName]
REMOVE DATABASE [YourDatabaseName];
-- Step 2: Drop database from secondary (run on each secondary)
DROP DATABASE [YourDatabaseName];
-- Step 3: Restore database on secondary with NORECOVERY
RESTORE DATABASE [YourDatabaseName] FROM DISK = 'YourBackupPath'
WITH NORECOVERY, REPLACE;
-- Step 4: Add database back to availability group (run on primary)
ALTER AVAILABILITY GROUP [YourAGName]
ADD DATABASE [YourDatabaseName];
-- Step 5: Join database to availability group (run on each secondary)
ALTER DATABASE [YourDatabaseName]
SET HADR AVAILABILITY GROUP = [YourAGName];
Optimize redo thread performance Adjusts trace flags and configuration to reduce versioning state contention.
-- Enable enhanced redo thread parallelism (SQL Server 2019+)
-- These trace flags improve redo performance and reduce state transition waits
DBCC TRACEON(3459, -1); -- Enable parallel redo
-- Verify trace flags are active
DBCC TRACESTATUS(-1);
AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.
Prevention
Configure appropriate transaction log file sizing with fixed growth increments rather than percentage growth to prevent redo queue buildup during heavy write activity. Set initial log file sizes to accommodate typical workload patterns plus 50% buffer capacity.
Monitor redo queue size consistently and establish alerts when queues exceed 100MB or when redo rates drop below normal baseline values. Redo queues exceeding 1GB indicate underlying storage or network issues requiring immediate attention.
Implement dedicated storage for transaction log files on secondary replicas with adequate IOPS capacity. Shared storage configurations often create resource contention leading to versioning state delays during peak activity periods.
Configure network compression for availability group endpoints when replicas span geographical locations. Compression reduces network latency impact on state synchronization while maintaining acceptable CPU overhead for most workloads.
Avoid frequent availability group configuration changes during business hours. State transitions multiply when adding or removing databases, changing synchronization modes, or modifying replica configurations during active workloads.
Establish baseline performance metrics for redo rates and queue sizes per database. Sudden deviations from established patterns indicate developing issues before they manifest as extended versioning state waits.
Need hands-on help?
Dealing with persistent hadr_database_versioning_state issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.