Quick Answer

HADR_BACKUP_QUEUE occurs when the Always On primary replica's backup background thread waits for secondary replicas to acknowledge bulk operation log releases. This happens during large operations like bulk inserts, index rebuilds, or backups when the primary holds BulkOp log records and needs secondary confirmation before releasing locks. Generally benign but can indicate network latency or secondary replica performance issues.

Root Cause Analysis

The Always On availability group architecture requires coordination between primary and secondary replicas for bulk operations that generate large amounts of transaction log. When bulk operations occur (BULK_INSERT, SELECT INTO, index operations with SORT_IN_TEMPDB, or certain backup operations), SQL Server generates BulkOp log records that contain metadata about the bulk operation rather than individual row changes.

The primary replica's backup background thread manages the release of these BulkOp log records. This thread must wait for confirmation from all synchronous secondary replicas (and asynchronous replicas within their timeout window) before releasing the bulk operation locks. The thread enters HADR_BACKUP_QUEUE state during this coordination phase.

SQL Server 2016 introduced improvements to bulk operation handling in Always On, reducing the frequency of these waits through better log record batching. SQL Server 2019 further optimized the backup thread's communication protocol with secondary replicas, particularly for scenarios involving multiple concurrent bulk operations.

The wait specifically occurs in the backup worker thread (not user threads performing the bulk operations). This thread runs continuously on the primary replica, processing a queue of bulk operation completion notifications. When the queue is empty or waiting for secondary acknowledgment, the thread enters this wait state.

Network latency, secondary replica disk I/O performance, or redo thread bottlenecks on secondary replicas directly impact the duration of these waits. The primary cannot release bulk operation resources until all configured replicas respond within their respective timeout thresholds.

Diagnostic Queries

-- Check current HADR_BACKUP_QUEUE waits and duration
SELECT 
    session_id,
    wait_type,
    wait_time_ms,
    blocking_session_id,
    wait_resource,
    command
FROM sys.dm_exec_requests 
WHERE wait_type = 'HADR_BACKUP_QUEUE'
AND session_id > 50;

-- Analyze recent bulk operations and their impact on AG replicas
SELECT 
    ar.replica_server_name,
    ar.availability_mode_desc,
    ars.synchronization_health_desc,
    ars.last_sent_time,
    ars.last_received_time,
    ars.last_hardened_time,
    DATEDIFF(ms, ars.last_sent_time, ars.last_received_time) AS send_receive_latency_ms,
    DATEDIFF(ms, ars.last_received_time, ars.last_hardened_time) AS harden_latency_ms
FROM sys.availability_replicas ar
INNER JOIN sys.dm_hadr_availability_replica_states ars
    ON ar.replica_id = ars.replica_id
WHERE ar.replica_server_name != @@SERVERNAME
ORDER BY send_receive_latency_ms DESC;

-- Identify bulk operations currently running or recently completed
SELECT 
    s.session_id,
    r.command,
    r.percent_complete,
    r.estimated_completion_time,
    r.database_id,
    DB_NAME(r.database_id) AS database_name,
    s.program_name,
    r.wait_type,
    r.wait_time_ms,
    t.text
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.command IN ('BULK INSERT', 'SELECT INTO', 'DbccFilesCompact', 'BACKUP DATABASE', 'BACKUP LOG')
   OR t.text LIKE '%BULK%INSERT%' 
   OR t.text LIKE '%SELECT%INTO%'
   OR r.wait_type LIKE 'HADR%';

-- Check AG redo queue size and performance on primary
SELECT 
    ar.replica_server_name,
    drs.database_id,
    DB_NAME(drs.database_id) AS database_name,
    drs.redo_queue_size,
    drs.redo_rate,
    drs.log_send_queue_size,
    drs.log_send_rate,
    CASE 
        WHEN drs.redo_rate > 0 THEN drs.redo_queue_size / drs.redo_rate 
        ELSE -1 
    END AS redo_queue_seconds_behind
FROM sys.availability_replicas ar
INNER JOIN sys.dm_hadr_database_replica_states drs
    ON ar.replica_id = drs.replica_id
WHERE ar.replica_server_name != @@SERVERNAME
  AND drs.redo_queue_size > 0
ORDER BY redo_queue_seconds_behind DESC;

-- Historical wait statistics for HADR_BACKUP_QUEUE
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms,
    CAST(100.0 * wait_time_ms / SUM(wait_time_ms) OVER() AS DECIMAL(5,2)) AS pct_of_waits
FROM sys.dm_os_wait_stats
WHERE wait_type = 'HADR_BACKUP_QUEUE'
  AND waiting_tasks_count > 0;

Fix Scripts

Force secondary replica synchronization catch-up

-- Check and potentially force log synchronization on lagging secondaries
-- WARNING: Run only during maintenance windows, can impact performance
DECLARE @replica_name NVARCHAR(256) = 'YOUR_SECONDARY_REPLICA_NAME';

-- First verify the replica is actually behind
SELECT 
    replica_server_name,
    redo_queue_size,
    log_send_queue_size
FROM sys.availability_replicas ar
INNER JOIN sys.dm_hadr_database_replica_states drs ON ar.replica_id = drs.replica_id
WHERE replica_server_name = @replica_name;

-- If significantly behind, consider temporarily changing to async mode
-- ALTER AVAILABILITY GROUP [YourAGName] 
-- MODIFY REPLICA ON @replica_name
-- WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);

Optimize bulk operations for Always On environments

-- Optimize backup and bulk operation performance in AG
-- Note: AG databases MUST remain in FULL recovery model

-- Enable backup compression to reduce data transfer to secondary replicas
EXEC sp_configure 'backup compression default', 1;
RECONFIGURE;

-- Use backup striping to parallelize backup I/O (example with 4 files)
/*
BACKUP DATABASE [YourDatabaseName]
TO DISK = 'D:\Backups\YourDB_1.bak',
   DISK = 'D:\Backups\YourDB_2.bak',
   DISK = 'D:\Backups\YourDB_3.bak',
   DISK = 'D:\Backups\YourDB_4.bak'
WITH COMPRESSION, FORMAT, INIT;
*/

-- For large bulk operations, consider batching
-- Example: Break large operations into smaller chunks
/*
DECLARE @BatchSize INT = 10000;
DECLARE @RowsProcessed INT = 0;

WHILE EXISTS (SELECT 1 FROM SourceTable WHERE ProcessFlag = 0)
BEGIN
    INSERT INTO TargetTable (Col1, Col2, Col3)
    SELECT TOP (@BatchSize) Col1, Col2, Col3
    FROM SourceTable 
    WHERE ProcessFlag = 0;
    
    UPDATE TOP (@BatchSize) SourceTable 
    SET ProcessFlag = 1 
    WHERE ProcessFlag = 0;
    
    SET @RowsProcessed = @RowsProcessed + @@ROWCOUNT;
    
    -- Allow log truncation opportunity
    CHECKPOINT;
    
    -- Brief pause to allow AG catch-up
    WAITFOR DELAY '00:00:01';
END
*/

Adjust Always On timeout settings

-- Increase session timeout for slow secondary replicas
-- Execute on primary replica
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryReplicaServerName'
WITH (SESSION_TIMEOUT = 20); -- Default is 10 seconds

-- For asynchronous replicas, verify timeout settings
SELECT 
    replica_server_name,
    availability_mode_desc,
    failover_mode_desc,
    session_timeout,
    primary_role_allow_connections_desc,
    secondary_role_allow_connections_desc
FROM sys.availability_replicas
WHERE replica_server_name != @@SERVERNAME;

Emergency: Suspend problematic replica temporarily

-- EMERGENCY USE ONLY: Temporarily suspend a replica that's causing excessive waits
-- This allows bulk operations to complete but breaks HA protection
USE master;
GO

-- Suspend data movement (run on primary)
ALTER DATABASE [YourDatabaseName] 
SET HADR SUSPEND;

-- Resume after bulk operations complete and replica catches up
-- ALTER DATABASE [YourDatabaseName] 
-- SET HADR RESUME;

-- Monitor replica status
SELECT 
    database_name,
    synchronization_state_desc,
    suspend_reason_desc
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.databases d ON drs.database_id = d.database_id
WHERE is_primary_replica = 0;

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

HADR_BACKUP_QUEUE Wait Type Explained

Quick Answer

Root Cause Analysis

Diagnostic Queries

Fix Scripts

Prevention

Need hands-on help?

Related Pages

HADR_AG_MUTEX

HADR_AR_CRITICAL_SECTION_ENTRY

HADR_BACKUP_BULK_LOCK

Always On AG Latency

sys.dm_hadr_availability_replica_states