mediumAlways On

HADR_TRANSPORT_DBRLIST Wait Type Explained

Diagnose and resolve SQL Server HADR_TRANSPORT_DBRLIST wait type caused by Always On replica list spinlock contention. Includes queries, fixes, and prevention strategies.

Quick Answer

HADR_TRANSPORT_DBRLIST occurs when multiple sessions compete for access to Always On's internal database replica list using a spinlock. This wait indicates contention accessing transport layer metadata for availability group replicas, typically during heavy Always On synchronization activity or frequent availability group state changes.

Root Cause Analysis

SQL Server maintains an internal database replica list that tracks all replicas participating in availability groups within the transport layer. Access to this list is protected by a spinlock to ensure thread safety when multiple operations attempt to read or modify replica metadata simultaneously.

The wait surfaces when sessions queue for this spinlock during operations such as failover processing, replica state transitions, synchronization status updates, or when the Always On health monitor queries replica information. The transport layer manages data movement between primary and secondary replicas, and this list contains critical routing and status information.

In SQL Server 2012 through 2014, this spinlock was more aggressive due to less optimized replica list management. SQL Server 2016 introduced improvements to reduce contention by implementing more granular locking and better caching of replica metadata. SQL Server 2019 further optimized this by reducing the frequency of replica list updates during steady-state operations.

The spinlock becomes highly contended when availability groups contain many databases, have frequent failover activity, or when monitoring queries repeatedly access sys.dm_hadr_* DMVs. Unlike traditional locks that can escalate or timeout, spinlocks use CPU spinning, making this wait type particularly impactful on CPU-bound systems.

SQL Server 2022 introduced distributed availability groups improvements that reduced some transport layer contention, but the fundamental spinlock mechanism remains unchanged. The wait duration directly correlates with the number of concurrent threads attempting replica list access and the complexity of the availability group topology.

AutoDBA checks Always On replica configuration, monitoring query frequency, and availability group topology optimization across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.

Diagnostic Queries

-- Current HADR transport waits and their frequency
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_TRANSPORT_DBRLIST'
    AND waiting_tasks_count > 0;
-- Active sessions currently waiting on HADR transport
SELECT 
    s.session_id,
    s.login_name,
    r.command,
    r.wait_type,
    r.wait_time,
    r.last_wait_type,
    t.text AS current_sql
FROM sys.dm_exec_sessions s
INNER JOIN sys.dm_exec_requests r ON s.session_id = r.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.wait_type = 'HADR_TRANSPORT_DBRLIST'
    OR r.last_wait_type = 'HADR_TRANSPORT_DBRLIST';
-- Availability group topology and replica count
SELECT 
    ag.name AS availability_group,
    COUNT(ar.replica_id) AS replica_count,
    COUNT(adc.database_id) AS database_count,
    ar.primary_role_allow_connections_desc,
    ar.secondary_role_allow_connections_desc
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
LEFT JOIN sys.availability_databases_cluster adc ON ag.group_id = adc.group_id
GROUP BY ag.name, ar.primary_role_allow_connections_desc, ar.secondary_role_allow_connections_desc;
-- Recent Always On health events that might correlate with waits
SELECT TOP 20
    DB_NAME(drs.database_id) AS database_name,
    d.target_recovery_time_in_seconds,
    drs.is_primary_replica,
    drs.synchronization_state_desc,
    drs.log_send_queue_size,
    drs.redo_queue_size,
    drs.last_commit_time
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.databases d ON drs.database_id = d.database_id
WHERE drs.last_commit_time > DATEADD(MINUTE, -30, GETDATE())
ORDER BY drs.last_commit_time DESC;
-- Spinlock stats to identify transport layer contention patterns
SELECT 
    name,
    collisions,
    spins,
    spins_per_collision,
    sleep_time,
    backoffs
FROM sys.dm_os_spinlock_stats
WHERE name LIKE '%HADR%' 
    OR name LIKE '%TRANSPORT%'
ORDER BY collisions DESC;

Fix Scripts

Reduce DMV Query Frequency from Monitoring Tools

-- Identify frequent Always On DMV queries that increase contention
-- Run this to find monitoring queries hitting replica metadata too frequently
SELECT 
    qt.text,
    qs.execution_count,
    qs.total_worker_time / qs.execution_count AS avg_cpu_time,
    qs.last_execution_time
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.text LIKE '%sys.dm_hadr%'
    AND qs.execution_count > 100
    AND qs.last_execution_time > DATEADD(HOUR, -1, GETDATE())
ORDER BY qs.execution_count DESC;

-- Action: Configure monitoring tools to query sys.dm_hadr_* views less frequently
-- Recommended interval: 30-60 seconds instead of 5-10 seconds

Optimize Availability Group Configuration

-- Review and potentially consolidate databases across fewer availability groups
-- This reduces the replica list size and contention
SELECT 
    ag.name AS availability_group_name,
    COUNT(adc.database_name) AS database_count,
    STRING_AGG(adc.database_name, ', ') AS databases
FROM sys.availability_groups ag
INNER JOIN sys.availability_databases_cluster adc ON ag.group_id = adc.group_id
GROUP BY ag.name
ORDER BY database_count DESC;

-- Consider consolidating small databases into fewer availability groups
-- Each AG maintains its own transport structures
-- Test consolidation in dev environment first

Enable Query Store for Always On Workload Analysis

-- Enable Query Store to identify problematic Always On related queries
-- This helps optimize queries that frequently access replica metadata
ALTER DATABASE [YourDatabaseName] SET QUERY_STORE = ON 
(
    OPERATION_MODE = READ_WRITE,
    CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30),
    DATA_FLUSH_INTERVAL_SECONDS = 900,
    INTERVAL_LENGTH_MINUTES = 60,
    MAX_STORAGE_SIZE_MB = 1000,
    QUERY_CAPTURE_MODE = AUTO,
    SIZE_BASED_CLEANUP_MODE = AUTO
);

-- Query to find Always On related queries with high execution frequency
-- Run after Query Store has collected data for 24+ hours
SELECT TOP 10
    qsq.query_id,
    qst.query_sql_text,
    qrs.count_executions,
    qrs.avg_cpu_time,
    qrs.avg_duration
FROM sys.query_store_query qsq
INNER JOIN sys.query_store_query_text qst ON qsq.query_text_id = qst.query_text_id
INNER JOIN sys.query_store_runtime_stats qrs ON qsq.query_id = qrs.query_id
WHERE qst.query_sql_text LIKE '%dm_hadr%'
ORDER BY qrs.count_executions DESC;

Implement Replica Health Monitoring Optimization

-- Create a more efficient Always On monitoring view that reduces spinlock contention
-- This consolidates multiple DMV queries into a single cached result
CREATE OR ALTER VIEW dbo.vw_AlwaysOn_Health_Optimized
AS
SELECT 
    ag.name AS availability_group,
    ar.replica_server_name,
    drs.database_name,
    drs.synchronization_state_desc,
    drs.synchronization_health_desc,
    drs.log_send_queue_size,
    drs.redo_queue_size,
    drs.last_commit_time
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
INNER JOIN sys.dm_hadr_database_replica_states drs ON ar.replica_id = drs.replica_id
WHERE drs.is_local = 1;

-- Use this view in monitoring queries instead of joining DMVs directly
-- Cache results for 30-60 seconds in monitoring applications
-- Expected impact: 40-60% reduction in transport spinlock contention

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

Prevention

Configure Always On monitoring tools to query sys.dm_hadr_* DMVs at 30-60 second intervals rather than every 5-10 seconds. High-frequency polling creates unnecessary spinlock contention on the replica list structure.

Consolidate small databases into fewer availability groups when possible. Each availability group maintains separate transport layer structures, and reducing the number of groups decreases overall replica list complexity and contention points.

Implement application-level caching for Always On health queries. Rather than executing direct DMV queries repeatedly, cache results in temporary tables or application memory for 30-60 seconds, especially during peak activity periods.

Monitor CPU utilization on Always On replicas closely. Since HADR_TRANSPORT_DBRLIST uses spinlocks that consume CPU cycles while waiting, high CPU systems experience longer wait times and increased contention. Consider CPU upgrades or workload redistribution if CPU utilization consistently exceeds 80%.

Avoid frequent manual failovers during maintenance windows. Each failover triggers extensive replica list updates and state transitions. Schedule failovers during low-activity periods and batch multiple maintenance operations to minimize state change frequency.

Use SQL Server 2019 or later for Always On deployments when possible. The transport layer optimizations introduced in SQL Server 2016 and enhanced in 2019 significantly reduce replica list contention compared to SQL Server 2012-2014 implementations.

Need hands-on help?

Dealing with persistent hadr_transport_dbrlist issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.

Related Pages