Quick Answer
HADR_DB_OP_COMPLETION_SYNC waits occur when Always On Availability Groups wait for acknowledgment of conversational messages between replicas, including database state changes, failover operations, and metadata synchronization. High waits indicate network latency, replica overload, or blocking on the secondary replica. This is typically concerning when wait times exceed 100ms consistently.
Root Cause Analysis
This wait type manifests when the primary replica sends conversational messages through the availability group transport layer and waits for explicit acknowledgment from secondary replicas. The SQL Server Always On architecture uses a dedicated conversation infrastructure built on Service Broker messaging for reliable communication between replicas.
When database operations require cross-replica coordination, the primary replica's worker threads enter HADR_DB_OP_COMPLETION_SYNC waits while the message dispatcher waits for responses. Common operations triggering these waits include database state transitions (online/offline), checkpoint operations on synchronous replicas, metadata changes propagated through the system databases, and failover coordinator messages.
The wait occurs in the SQLOS scheduler when worker threads yield CPU while waiting for network I/O completion from the availability group endpoint. SQL Server 2016 introduced improvements to the message batching mechanism, reducing wait frequency for small operations. SQL Server 2019 enhanced the message acknowledgment protocol with better timeout handling and retry logic.
Secondary replica performance directly impacts these waits. If the secondary experiences blocking, resource contention, or slow storage, acknowledgment messages delay, causing prolonged waits on the primary. Network latency between replicas compounds the issue, especially in geographically distributed deployments.
The Always On lease mechanism also uses this wait type. When the Windows Server Failover Cluster lease renewal requires database-level coordination, worker threads wait for lease confirmation messages between replicas.
AutoDBA checks Always On Availability Group configuration, replica health monitoring, and network latency analysis across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.
Diagnostic Queries
-- Current HADR_DB_OP_COMPLETION_SYNC waits by session
SELECT
s.session_id,
s.status,
r.wait_type,
r.wait_time,
r.wait_resource,
s.program_name,
t.text as current_sql
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type = 'HADR_DB_OP_COMPLETION_SYNC';
-- Historical wait statistics for HADR_DB_OP_COMPLETION_SYNC
SELECT
wait_type,
waiting_tasks_count,
wait_time_ms,
max_wait_time_ms,
signal_wait_time_ms,
wait_time_ms / NULLIF(waiting_tasks_count, 0) as avg_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type = 'HADR_DB_OP_COMPLETION_SYNC';
-- Always On replica states and lag metrics
SELECT
ar.replica_server_name,
db.database_name,
drs.synchronization_state_desc,
drs.synchronization_health_desc,
drs.last_commit_time,
drs.log_send_queue_size,
drs.redo_queue_size,
drs.redo_rate
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.availability_databases_cluster db ON drs.database_id = db.database_id
INNER JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id;
-- Network endpoint performance for Always On
SELECT
endpoint_name,
protocol_desc,
type_desc,
state_desc,
is_encryption_enabled,
connection_auth_desc
FROM sys.endpoints
WHERE type_desc = 'DATABASE_MIRRORING';
-- Always On extended events for message tracking (creates session)
CREATE EVENT SESSION [HADR_Message_Tracking] ON SERVER
ADD EVENT sqlserver.hadr_transport_receive_message,
ADD EVENT sqlserver.hadr_transport_send_message
ADD TARGET package0.ring_buffer(SET max_memory=4096)
WITH (STARTUP_STATE=OFF);
Fix Scripts
Increase endpoint timeout values for slow networks
-- Increase session timeout for Always On endpoints
-- Test in dev first - affects failover detection time
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'PrimaryServer'
WITH (SESSION_TIMEOUT = 15); -- Default is 10 seconds
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (SESSION_TIMEOUT = 15);
-- Impact: Reduces timeout-related waits but increases failover detection time
Optimize secondary replica for faster acknowledgments
-- Enable read-only routing list optimization
-- Reduces conversational message overhead
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (READ_ONLY_ROUTING_URL = 'TCP://SecondaryServer:1433');
-- Set appropriate backup priority to reduce backup-related messages
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (BACKUP_PRIORITY = 10); -- Lower than primary
-- Impact: Reduces cross-replica coordination for backup operations
Configure asynchronous commit for non-critical databases
-- Change to asynchronous commit mode for databases that can tolerate data loss
-- WARNING: This allows potential data loss during failover
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServer'
WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);
-- Impact: Eliminates synchronous acknowledgment waits entirely
Enable DTC support for SQL Server 2019+
-- SQL Server 2019+ only - enables distributed transaction support per database
ALTER AVAILABILITY GROUP [YourAGName]
SET (DTC_SUPPORT = PER_DB);
-- Impact: Enables distributed transactions across AG databases
AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.
Prevention
Configure Always On endpoints with appropriate timeout values based on network latency measurements. For WANs exceeding 50ms round-trip time, increase SESSION_TIMEOUT to 20-30 seconds. Monitor network bandwidth utilization between replica sites and ensure dedicated network paths for Always On traffic.
Implement secondary replica maintenance windows during low-activity periods. Blocking processes on secondary replicas directly cause HADR_DB_OP_COMPLETION_SYNC waits on the primary. Use readable secondary replicas judiciously, as read workloads can delay acknowledgment messages.
Deploy geographically distributed replicas with asynchronous commit mode unless business requirements mandate synchronous replication. Reserve synchronous commit for local replicas with sub-10ms network latency.
Monitor redo queue size and redo rate on secondary replicas. Consistently high redo queues indicate storage performance issues causing delayed acknowledgments. Ensure secondary replica storage matches or exceeds primary replica performance characteristics.
Configure Always On extended events to capture message patterns and identify specific operations causing excessive conversational traffic. Baseline normal message volumes during application development to identify abnormal patterns in production.
Use dedicated network interfaces for Always On endpoints when possible. Shared network paths with user traffic can introduce variable latency affecting message acknowledgment timing.
Need hands-on help?
Dealing with persistent hadr_db_op_completion_sync issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.