Quick Answer

HADR_DB_OP_COMPLETION_SYNC waits occur when Always On Availability Groups wait for acknowledgment of conversational messages between replicas, including database state changes, failover operations, and metadata synchronization. High waits indicate network latency, replica overload, or blocking on the secondary replica. This is typically concerning when wait times exceed 100ms consistently.

Root Cause Analysis

This wait type manifests when the primary replica sends conversational messages through the availability group transport layer and waits for explicit acknowledgment from secondary replicas. The SQL Server Always On architecture uses a dedicated conversation infrastructure built on Service Broker messaging for reliable communication between replicas.

When database operations require cross-replica coordination, the primary replica's worker threads enter HADR_DB_OP_COMPLETION_SYNC waits while the message dispatcher waits for responses. Common operations triggering these waits include database state transitions (online/offline), checkpoint operations on synchronous replicas, metadata changes propagated through the system databases, and failover coordinator messages.

The wait occurs in the SQLOS scheduler when worker threads yield CPU while waiting for network I/O completion from the availability group endpoint. SQL Server 2016 introduced improvements to the message batching mechanism, reducing wait frequency for small operations. SQL Server 2019 enhanced the message acknowledgment protocol with better timeout handling and retry logic.

Secondary replica performance directly impacts these waits. If the secondary experiences blocking, resource contention, or slow storage, acknowledgment messages delay, causing prolonged waits on the primary. Network latency between replicas compounds the issue, especially in geographically distributed deployments.

The Always On lease mechanism also uses this wait type. When the Windows Server Failover Cluster lease renewal requires database-level coordination, worker threads wait for lease confirmation messages between replicas.

Diagnostic Queries

-- Current HADR_DB_OP_COMPLETION_SYNC waits by session
SELECT 
    s.session_id,
    s.status,
    r.wait_type,
    r.wait_time,
    r.wait_resource,
    s.program_name,
    t.text as current_sql
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type = 'HADR_DB_OP_COMPLETION_SYNC';

-- Historical wait statistics for HADR_DB_OP_COMPLETION_SYNC
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms / NULLIF(waiting_tasks_count, 0) as avg_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type = 'HADR_DB_OP_COMPLETION_SYNC';

-- Always On replica states and lag metrics
SELECT 
    ar.replica_server_name,
    db.database_name,
    drs.synchronization_state_desc,
    drs.synchronization_health_desc,
    drs.last_commit_time,
    drs.log_send_queue_size,
    drs.redo_queue_size,
    drs.redo_rate
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.availability_databases_cluster db ON drs.database_id = db.database_id
INNER JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id;

-- Network endpoint performance for Always On
SELECT 
    endpoint_name,
    protocol_desc,
    type_desc,
    state_desc,
    is_encryption_enabled,
    connection_auth_desc
FROM sys.endpoints
WHERE type_desc = 'DATABASE_MIRRORING';

-- Always On extended events for message tracking (creates session)
CREATE EVENT SESSION [HADR_Message_Tracking] ON SERVER 
ADD EVENT sqlserver.hadr_transport_receive_message,
ADD EVENT sqlserver.hadr_transport_send_message
ADD TARGET package0.ring_buffer(SET max_memory=4096)
WITH (STARTUP_STATE=OFF);

Fix Scripts

Increase endpoint timeout values for slow networks

-- Increase session timeout for Always On endpoints
-- Test in dev first - affects failover detection time
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'PrimaryServer'
WITH (SESSION_TIMEOUT = 15); -- Default is 10 seconds

ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (SESSION_TIMEOUT = 15);
-- Impact: Reduces timeout-related waits but increases failover detection time

Optimize secondary replica for faster acknowledgments

-- Enable read-only routing list optimization
-- Reduces conversational message overhead
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (READ_ONLY_ROUTING_URL = 'TCP://SecondaryServer:1433');

-- Set appropriate backup priority to reduce backup-related messages
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (BACKUP_PRIORITY = 10); -- Lower than primary
-- Impact: Reduces cross-replica coordination for backup operations

Configure asynchronous commit for non-critical databases

-- Change to asynchronous commit mode for databases that can tolerate data loss
-- WARNING: This allows potential data loss during failover
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);
-- Impact: Eliminates synchronous acknowledgment waits entirely

Enable DTC support for SQL Server 2019+

-- SQL Server 2019+ only - enables distributed transaction support per database
ALTER AVAILABILITY GROUP [YourAGName]
SET (DTC_SUPPORT = PER_DB);
-- Impact: Enables distributed transactions across AG databases

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

HADR_DB_OP_COMPLETION_SYNC Wait Type Explained

Quick Answer

Root Cause Analysis

Diagnostic Queries

Fix Scripts

Prevention

Need hands-on help?

Related Pages

HADR_AG_MUTEX

HADR_AR_CRITICAL_SECTION_ENTRY

HADR_BACKUP_BULK_LOCK

Always On AG Latency

sys.dm_hadr_availability_replica_states