Quick Answer

HADR_TRANSPORT_SESSION waits occur when Always On Availability Groups experience delays accessing or modifying the underlying TCP transport layer that handles data movement between replicas. This typically indicates network bottlenecks, SSL certificate processing overhead, or transport session state changes during failovers.

Root Cause Analysis

The HADR transport session manages the TCP connections between primary and secondary replicas in Always On Availability Groups. When SQL Server needs to send log records, heartbeats, or availability group metadata between replicas, it must acquire locks on the transport session object to ensure thread safety during network operations.

These waits manifest when multiple threads contend for access to the same transport session, particularly during high transaction log generation periods or when the network connection experiences latency. The Database Engine's Always On Worker threads compete for exclusive access to modify transport session state, such as updating send/receive buffers, processing SSL handshakes, or handling connection timeouts.

SQL Server 2016 introduced more granular transport session locking to reduce contention, but the fundamental bottleneck remains network throughput and SSL processing overhead. In SQL Server 2019 and later, Microsoft optimized the transport layer to batch smaller log blocks more efficiently, reducing the frequency of transport session state changes.

The wait specifically occurs in the Always On Log Capture thread when it needs to update the transport session's last sent LSN, and in the Always On Redo threads when acknowledging received log blocks. Certificate-based encryption adds computational overhead that increases these wait times, especially with larger Always On configurations spanning multiple data centers.

Diagnostic Queries

-- Check current HADR transport session waits and their duration
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_TRANSPORT_SESSION'
    AND waiting_tasks_count > 0;

-- Identify which AG replicas are experiencing transport delays
SELECT 
    ar.replica_server_name,
    ars.role_desc,
    ars.connected_state_desc,
    ars.last_connect_error_number,
    ars.last_connect_error_description,
    adc.database_name,
    drs.log_send_queue_size,
    drs.log_send_rate,
    drs.redo_queue_size,
    drs.redo_rate
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.dm_hadr_availability_replica_cluster_states arcs ON ar.replica_id = arcs.replica_id
LEFT JOIN sys.availability_databases_cluster adc ON ar.group_id = adc.group_id
LEFT JOIN sys.dm_hadr_database_replica_states drs ON adc.database_id = drs.database_id 
    AND ar.replica_id = drs.replica_id
WHERE ars.connected_state_desc != 'CONNECTED'
    OR drs.log_send_queue_size > 10240; -- More than 10MB queue

-- Monitor network performance between AG replicas
SELECT 
    ar.replica_server_name,
    drs.database_name,
    drs.log_send_queue_size / 1024.0 AS log_send_queue_mb,
    drs.log_send_rate / 1024.0 AS log_send_rate_mb_sec,
    CASE 
        WHEN drs.log_send_rate = 0 THEN NULL
        ELSE drs.log_send_queue_size / drs.log_send_rate 
    END AS estimated_recovery_time_sec
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
WHERE drs.is_local = 0; -- Remote replicas only

-- Check SSL certificate and encryption overhead impact
SELECT 
    ag.name AS availability_group_name,
    ar.replica_server_name,
    ar.endpoint_url,
    e.encryption_algorithm_desc,
    ar.availability_mode_desc,
    ar.failover_mode_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.database_mirroring_endpoints e ON ar.replica_server_name = @@SERVERNAME;

-- Analyze Always On worker thread contention patterns
SELECT 
    r.session_id,
    r.request_id,
    r.wait_type,
    r.wait_time,
    r.blocking_session_id,
    t.task_state,
    t.context_switches_count,
    s.program_name
FROM sys.dm_exec_requests r
JOIN sys.dm_os_tasks t ON r.session_id = t.session_id
JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
WHERE r.wait_type = 'HADR_TRANSPORT_SESSION'
    OR s.program_name LIKE 'Always On%';

Fix Scripts

Increase network buffer sizes to reduce transport session contention

-- WARNING: Test in development first. Requires AG restart.
-- Increases the network send/receive buffer sizes for the AG endpoint
ALTER ENDPOINT [Hadr_endpoint] 
AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL)
FOR DATABASE_MIRRORING (
    ROLE = ALL, 
    AUTHENTICATION = CERTIFICATE [AGCert],
    ENCRYPTION = REQUIRED ALGORITHM AES
);

-- Monitor improvement with this query after restart
SELECT wait_time_ms, waiting_tasks_count 
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_TRANSPORT_SESSION';

Reduce SSL encryption overhead by optimizing cipher selection

-- Create optimized endpoint configuration for better SSL performance
-- Test thoroughly as this affects all AG traffic security
ALTER ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_PORT = 5022)
FOR DATABASE_MIRRORING (
    ROLE = ALL,
    AUTHENTICATION = CERTIFICATE [AGCert],
    ENCRYPTION = REQUIRED ALGORITHM AES -- More efficient than RC4
);

-- Verify endpoint is using optimal encryption
SELECT name, encryption_algorithm_desc 
FROM sys.database_mirroring_endpoints;

Implement AG replica load balancing for read workloads

-- Configure readable secondary replicas to reduce primary transport load
-- This redistributes connection load and reduces transport session pressure
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'SecondaryServer'
WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));

-- Enable read-only routing to automatically distribute read traffic
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'PrimaryServer'
WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = ('SecondaryServer')));

Optimize transaction log shipping batch sizes

-- Reduce frequency of transport session access by batching log records
-- Requires SQL Server 2016 or later for optimal effectiveness
EXEC sp_configure 'max degree of parallelism', 4; -- Reduce parallel operations load
RECONFIGURE;

-- Monitor the impact on log send queue efficiency
SELECT 
    database_name,
    log_send_queue_size,
    log_send_rate,
    redo_queue_size 
FROM sys.dm_hadr_database_replica_states 
WHERE is_local = 0;

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

HADR_TRANSPORT_SESSION Wait Type Explained

Quick Answer

Root Cause Analysis

Diagnostic Queries

Fix Scripts

Prevention

Need hands-on help?

Related Pages

HADR_TRANSPORT_DBRLIST

HADR_AG_MUTEX

HADR_AR_CRITICAL_SECTION_ENTRY

HADR_BACKUP_BULK_LOCK

Always On AG Latency

sys.dm_hadr_availability_replica_states