Quick Answer

HADR_DB_COMMAND waits occur when SQL Server is waiting for responses to conversational messages in Always On Availability Groups, specifically for database-level commands like synchronization, state changes, and metadata operations. These waits typically indicate communication delays between primary and secondary replicas during database operations rather than performance problems.

Root Cause Analysis

HADR_DB_COMMAND waits manifest when the primary replica sends conversational messages to secondary replicas through the Always On transport layer and must wait for acknowledgments. The Database Engine uses a request-response messaging pattern for critical database operations including log block shipping confirmations, database state transitions (online/offline/recovering), metadata synchronization, and failover coordination.

The wait occurs in the Always On worker threads that manage replica communication. When a primary replica executes operations requiring replica coordination, it sends messages through the availability group transport using TCP endpoints. The sending thread enters HADR_DB_COMMAND wait state until receiving the expected response or timing out.

Common triggers include ALTER DATABASE statements on availability databases, automatic seeding operations, database state changes during failover, and log truncation coordination. The wait duration correlates directly with network latency between replicas and the processing time on secondary replicas.

SQL Server 2016 introduced automatic seeding which increased HADR_DB_COMMAND waits during initial synchronization. SQL Server 2017 added clusterless availability groups affecting message routing patterns. SQL Server 2019 improved message batching reducing wait frequency for log shipping operations. SQL Server 2022 enhanced the transport protocol with better compression and multiplexing capabilities.

The internal message queue depth and worker thread availability directly impact wait duration. Secondary replica resource contention, particularly I/O bottlenecks during redo operations, extends response times causing longer waits on the primary.

Diagnostic Queries

-- Current HADR_DB_COMMAND waits and their context
SELECT 
    session_id,
    wait_type,
    wait_time_ms,
    blocking_session_id,
    wait_resource,
    command,
    status
FROM sys.dm_exec_requests 
WHERE wait_type = 'HADR_DB_COMMAND'
ORDER BY wait_time_ms DESC;

-- Availability group replica synchronization states
SELECT 
    ag.name AS availability_group,
    r.replica_server_name,
    rs.synchronization_state_desc,
    rs.last_commit_time,
    rs.log_send_queue_size,
    rs.redo_queue_size,
    rs.redo_rate
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas r ON ag.group_id = r.group_id
INNER JOIN sys.dm_hadr_database_replica_states rs ON r.replica_id = rs.replica_id
ORDER BY rs.log_send_queue_size DESC;

-- Check Always On replica connectivity and error states
SELECT 
    ar.replica_server_name,
    ars.connected_state_desc,
    ars.last_connect_error_description,
    ars.last_connect_error_number,
    ars.last_connect_error_timestamp
FROM sys.dm_hadr_availability_replica_states ars
INNER JOIN sys.availability_replicas ar ON ars.replica_id = ar.replica_id
WHERE ars.connected_state_desc != 'CONNECTED'
   OR ars.last_connect_error_number IS NOT NULL;

-- Historical wait statistics for HADR operations
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms / NULLIF(waiting_tasks_count, 0) as avg_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type LIKE 'HADR%'
ORDER BY wait_time_ms DESC;

-- Active Always On worker threads and their current operations
SELECT 
    t.session_id,
    t.task_state,
    t.context_switches_count,
    s.host_name,
    s.program_name,
    r.command,
    r.wait_type,
    r.wait_time_ms
FROM sys.dm_os_tasks t
INNER JOIN sys.dm_exec_sessions s ON t.session_id = s.session_id
LEFT JOIN sys.dm_exec_requests r ON t.session_id = r.session_id
WHERE s.program_name LIKE '%Always On%'
OR r.wait_type LIKE 'HADR%';

Fix Scripts

Increase Always On Transport Timeout Values Extends message timeout periods to accommodate slower networks or overloaded secondary replicas.

-- Increase session timeout for availability group endpoints
-- Test thoroughly in development first
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServerName'
WITH (SESSION_TIMEOUT = 30); -- Default is 10 seconds

-- Apply to all replicas in the availability group
DECLARE @sql NVARCHAR(MAX) = '';
SELECT @sql = @sql + 'ALTER AVAILABILITY GROUP [' + ag.name + '] 
MODIFY REPLICA ON ''' + r.replica_server_name + ''' 
WITH (SESSION_TIMEOUT = 30);' + CHAR(13)
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas r ON ag.group_id = r.group_id;
PRINT @sql;
-- Execute after review: EXEC sp_executesql @sql;

Caution: Higher timeouts mask underlying network or performance issues. Monitor replica lag after implementation.

Optimize Log Transport Performance Reduces message frequency by adjusting log block size and compression settings.

-- Enable compression on availability group transport
-- Requires SQL Server 2016 SP1 or later
ALTER AVAILABILITY GROUP [YourAGName]  
MODIFY REPLICA ON 'SecondaryServerName'
WITH (SEEDING_MODE = MANUAL, -- Prevents automatic seeding overhead
      BACKUP_PRIORITY = 50,
      SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY),
      PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL));

-- Check current compression status
SELECT 
    replica_server_name,
    endpoint_url,
    is_compression_enabled
FROM sys.availability_replicas r
INNER JOIN sys.database_mirroring_endpoints e ON r.endpoint_url LIKE '%' + CAST(e.port AS VARCHAR(10)) + '%';

Impact: Reduces network bandwidth usage but increases CPU overhead for compression operations.

Force Log Truncation on Overloaded Secondary Prevents log growth when secondary replicas fall behind in log processing.

-- Identify databases with large unsent log queues
SELECT 
    database_name,
    log_send_queue_size,
    redo_queue_size,
    recovery_lsn,
    last_commit_time
FROM sys.dm_hadr_database_replica_states
WHERE is_primary_replica = 1
AND log_send_queue_size > 10000; -- 10MB threshold

-- Emergency log truncation (use with extreme caution)
-- This will break the AG synchronization temporarily
-- BACKUP LOG [YourDatabase] TO DISK = 'NUL:';
-- Then re-establish AG after addressing secondary performance

Warning: Breaking log chain requires full database re-seeding to secondary replicas. Use only in emergencies.

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

HADR_DB_COMMAND Wait Type Explained

Quick Answer

Root Cause Analysis

Diagnostic Queries

Fix Scripts

Prevention

Need hands-on help?

Related Pages

HADR_AG_MUTEX

HADR_AR_CRITICAL_SECTION_ENTRY

HADR_BACKUP_BULK_LOCK

Always On AG Latency

sys.dm_hadr_availability_replica_states