mediumAlways On

HADR_WORK_POOL Wait Type Explained

Diagnose and resolve SQL Server HADR_WORK_POOL waits caused by Always On Availability Group thread pool contention. Includes diagnostic queries and fix scripts.

Quick Answer

HADR_WORK_POOL waits occur when Always On Availability Group background tasks compete for worker threads in the availability group thread pool. This wait type indicates thread pool exhaustion or contention within the AG background processing infrastructure, typically caused by high synchronization overhead or insufficient worker thread allocation.

Root Cause Analysis

SQL Server's Always On implementation uses a dedicated worker thread pool to handle availability group background operations including log block movement, health detection, failover coordination, and replica state management. The HADR_WORK_POOL wait occurs when these background tasks cannot acquire a worker thread from this specialized pool.

The availability group work pool operates separately from the standard SQL Server worker thread pool. Each availability group replica maintains its own set of background tasks that must be processed sequentially through this pool. When the pool becomes saturated, incoming tasks queue up and generate HADR_WORK_POOL waits.

Common scenarios triggering this wait include: heavy transaction log generation causing log transport thread saturation, network latency between replicas creating backlogs in synchronization threads, multiple availability groups on a single instance competing for the same thread pool resources, and intensive health detection operations during network instability.

In SQL Server 2016 and later versions, Microsoft improved the thread pool management for availability groups, but the fundamental architecture remains unchanged. SQL Server 2019 introduced better telemetry around AG thread utilization, while SQL Server 2022 enhanced the load balancing algorithms within the thread pool to reduce contention.

The wait typically correlates with other HADR-related waits like HADR_SYNC_COMMIT or HADR_LOGCAPTURE_WAIT, indicating broader availability group performance issues rather than isolated thread pool problems.

AutoDBA checks Always On Availability Group health monitoring, replica synchronization states, and thread pool utilization metrics across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.

Diagnostic Queries

-- Check current HADR_WORK_POOL wait statistics
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_WORK_POOL'
    AND waiting_tasks_count > 0;
-- Identify current HADR background tasks and their states
SELECT 
    session_id,
    request_id,
    wait_type,
    wait_time,
    blocking_session_id,
    command,
    database_id,
    DB_NAME(database_id) AS database_name
FROM sys.dm_exec_requests
WHERE command LIKE '%HADR%' 
    OR wait_type LIKE 'HADR%';
-- Analyze availability group replica states and synchronization health
SELECT 
    ag.name AS ag_name,
    ar.replica_server_name,
    adc.database_name,
    ars.role_desc,
    ars.synchronization_state_desc,
    ars.synchronization_health_desc,
    ar.availability_mode_desc,
    ar.failover_mode_desc
FROM sys.dm_hadr_availability_replica_states ars
JOIN sys.availability_replicas ar ON ars.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
LEFT JOIN sys.dm_hadr_database_replica_states drs ON ars.replica_id = drs.replica_id
LEFT JOIN sys.availability_databases_cluster adc ON drs.database_id = adc.database_id;
-- Monitor log send queue sizes and rates across replicas
SELECT 
    ag.name AS ag_name,
    drs.database_id,
    DB_NAME(drs.database_id) AS database_name,
    drs.log_send_queue_size,
    drs.log_send_rate,
    drs.redo_queue_size,
    drs.redo_rate,
    drs.synchronization_state_desc,
    ar.replica_server_name
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
WHERE drs.is_local = 0
ORDER BY drs.log_send_queue_size DESC;
-- Check for availability group thread pool pressure indicators
SELECT 
    scheduler_id,
    is_online,
    current_tasks_count,
    runnable_tasks_count,
    current_workers_count,
    active_workers_count,
    work_queue_count,
    pending_disk_io_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255; -- Exclude hidden schedulers

Fix Scripts

Reduce AG synchronization overhead by optimizing commit mode This script changes synchronous replicas to asynchronous where data loss tolerance allows, reducing thread pool pressure from synchronous commit waits.

-- WARNING: Test in development first. This changes data protection level.
-- Only run during maintenance windows with business approval.

-- Change replica to asynchronous commit (reduces synchronization overhead)
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON 'SecondaryServerName'
WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);

-- Monitor the change impact
SELECT 
    ar.replica_server_name,
    ar.availability_mode_desc,
    ars.synchronization_health_desc
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
WHERE ar.replica_server_name = 'SecondaryServerName';

Increase max worker threads if system has available CPU resources This script increases the worker thread pool size when the system has sufficient CPU cores to handle additional threads.

-- Check current max worker threads setting
SELECT name, value, value_in_use 
FROM sys.configurations 
WHERE name = 'max worker threads';

-- Increase max worker threads (requires restart)
-- Calculate: 256 + ((logical_cpus - 4) * 8) for 64-bit systems
-- Only increase if CPU utilization is consistently below 80%
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'max worker threads', 1024; -- Adjust based on your CPU count
RECONFIGURE;

-- Note: Requires SQL Server service restart to take effect
-- Monitor worker thread usage after restart with sys.dm_os_schedulers

Optimize tempdb configuration to reduce AG log generation TempDB contention can cause excessive transaction log activity, increasing AG synchronization workload.

-- Add additional tempdb data files to reduce contention
-- This reduces transaction log pressure on user databases
USE master;
GO

-- Check current tempdb file configuration
SELECT 
    name,
    physical_name,
    size * 8 / 1024 AS size_mb,
    growth,
    is_percent_growth
FROM sys.master_files 
WHERE database_id = DB_ID('tempdb') 
    AND type = 0; -- Data files only

-- Add tempdb data files (one per CPU core up to 8 files)
-- Adjust file paths and sizes based on your environment
ALTER DATABASE tempdb
ADD FILE (
    NAME = 'tempdev3',
    FILENAME = 'C:\TempDB\tempdev3.mdf',
    SIZE = 1024MB,
    FILEGROWTH = 256MB
);
-- Repeat for additional files as needed

Configure AG dashboard alerts for proactive monitoring This script creates custom alerts to detect HADR_WORK_POOL wait accumulation before it impacts users.

-- Create custom performance counters alert for AG thread pool pressure
USE msdb;
GO

-- Create alert for high HADR work pool waits
EXEC msdb.dbo.sp_add_alert
    @name = 'HADR Work Pool High Waits',
    @message_id = NULL,
    @severity = NULL,
    @notification_message = 'HADR_WORK_POOL waits indicate AG thread pool contention',
    @category_name = 'Availability Group',
    @performance_condition = 'SQLServer:Wait Statistics|Average wait time (ms)|HADR_WORK_POOL|>|1000',
    @include_event_description_in = 1,
    @job_id = NULL;

-- Add notification (replace with your operator)
EXEC msdb.dbo.sp_add_notification
    @alert_name = 'HADR Work Pool High Waits',
    @operator_name = 'DBA_Team',
    @notification_method = 1; -- Email

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

Prevention

Configure availability groups with asynchronous commit mode for reporting replicas and geographically distant replicas to minimize synchronization thread pressure. Reserve synchronous commit only for local high-availability replicas where zero data loss is mandatory.

Size the max worker threads setting appropriately for your server's CPU core count and AG workload. Monitor sys.dm_os_schedulers regularly to ensure work_queue_count remains near zero during peak operations.

Implement database-level optimizations that reduce transaction log generation: enable trace flag 610 for minimal logging operations, optimize ETL processes to use bulk operations, and configure tempdb with multiple data files to reduce log contention.

Establish network quality monitoring between AG replicas. Network latency spikes cause log transport threads to back up, creating cascading thread pool exhaustion. Configure connection timeouts and retry logic appropriately for your network infrastructure.

Use SQL Server 2019 or later versions for improved AG thread pool management and enhanced monitoring capabilities through extended events. The sys.dm_hadr_* DMVs provide better visibility into thread utilization patterns in newer versions.

Schedule regular maintenance to clear wait statistics and establish baseline performance metrics. HADR_WORK_POOL waits often indicate broader AG performance degradation that requires systematic analysis rather than isolated fixes.

Need hands-on help?

Dealing with persistent hadr_work_pool issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.

Related Pages