mediumAlways On

HADR_TRANSPORT_SESSION Wait Type Explained

Fix SQL Server HADR_TRANSPORT_SESSION waits in Always On Availability Groups. Diagnose network bottlenecks and SSL overhead causing transport delays between replicas.

Quick Answer

HADR_TRANSPORT_SESSION waits occur when Always On Availability Groups experience delays accessing or modifying the underlying TCP transport layer that handles data movement between replicas. This typically indicates network bottlenecks, SSL certificate processing overhead, or transport session state changes during failovers.

Root Cause Analysis

The HADR transport session manages the TCP connections between primary and secondary replicas in Always On Availability Groups. When SQL Server needs to send log records, heartbeats, or availability group metadata between replicas, it must acquire locks on the transport session object to ensure thread safety during network operations.

These waits manifest when multiple threads contend for access to the same transport session, particularly during high transaction log generation periods or when the network connection experiences latency. The Database Engine's Always On Worker threads compete for exclusive access to modify transport session state, such as updating send/receive buffers, processing SSL handshakes, or handling connection timeouts.

SQL Server 2016 introduced more granular transport session locking to reduce contention, but the fundamental bottleneck remains network throughput and SSL processing overhead. In SQL Server 2019 and later, Microsoft optimized the transport layer to batch smaller log blocks more efficiently, reducing the frequency of transport session state changes.

The wait specifically occurs in the Always On Log Capture thread when it needs to update the transport session's last sent LSN, and in the Always On Redo threads when acknowledging received log blocks. Certificate-based encryption adds computational overhead that increases these wait times, especially with larger Always On configurations spanning multiple data centers.

AutoDBA checks Always On Availability Group health, network performance between replicas, and transport session configuration across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.

Diagnostic Queries

-- Check current HADR transport session waits and their duration
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_TRANSPORT_SESSION'
    AND waiting_tasks_count > 0;
-- Identify which AG replicas are experiencing transport delays
SELECT 
    ar.replica_server_name,
    ars.role_desc,
    ars.connected_state_desc,
    ars.last_connect_error_number,
    ars.last_connect_error_description,
    adc.database_name,
    drs.log_send_queue_size,
    drs.log_send_rate,
    drs.redo_queue_size,
    drs.redo_rate
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.dm_hadr_availability_replica_cluster_states arcs ON ar.replica_id = arcs.replica_id
LEFT JOIN sys.availability_databases_cluster adc ON ar.group_id = adc.group_id
LEFT JOIN sys.dm_hadr_database_replica_states drs ON adc.database_id = drs.database_id 
    AND ar.replica_id = drs.replica_id
WHERE ars.connected_state_desc != 'CONNECTED'
    OR drs.log_send_queue_size > 10240; -- More than 10MB queue
-- Monitor network performance between AG replicas
SELECT 
    ar.replica_server_name,
    drs.database_name,
    drs.log_send_queue_size / 1024.0 AS log_send_queue_mb,
    drs.log_send_rate / 1024.0 AS log_send_rate_mb_sec,
    CASE 
        WHEN drs.log_send_rate = 0 THEN NULL
        ELSE drs.log_send_queue_size / drs.log_send_rate 
    END AS estimated_recovery_time_sec
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
WHERE drs.is_local = 0; -- Remote replicas only
-- Check SSL certificate and encryption overhead impact
SELECT 
    ag.name AS availability_group_name,
    ar.replica_server_name,
    ar.endpoint_url,
    e.encryption_algorithm_desc,
    ar.availability_mode_desc,
    ar.failover_mode_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.database_mirroring_endpoints e ON ar.replica_server_name = @@SERVERNAME;
-- Analyze Always On worker thread contention patterns
SELECT 
    r.session_id,
    r.request_id,
    r.wait_type,
    r.wait_time,
    r.blocking_session_id,
    t.task_state,
    t.context_switches_count,
    s.program_name
FROM sys.dm_exec_requests r
JOIN sys.dm_os_tasks t ON r.session_id = t.session_id
JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
WHERE r.wait_type = 'HADR_TRANSPORT_SESSION'
    OR s.program_name LIKE 'Always On%';

Fix Scripts

Increase network buffer sizes to reduce transport session contention

-- WARNING: Test in development first. Requires AG restart.
-- Increases the network send/receive buffer sizes for the AG endpoint
ALTER ENDPOINT [Hadr_endpoint] 
AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL)
FOR DATABASE_MIRRORING (
    ROLE = ALL, 
    AUTHENTICATION = CERTIFICATE [AGCert],
    ENCRYPTION = REQUIRED ALGORITHM AES
);

-- Monitor improvement with this query after restart
SELECT wait_time_ms, waiting_tasks_count 
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_TRANSPORT_SESSION';

Reduce SSL encryption overhead by optimizing cipher selection

-- Create optimized endpoint configuration for better SSL performance
-- Test thoroughly as this affects all AG traffic security
ALTER ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_PORT = 5022)
FOR DATABASE_MIRRORING (
    ROLE = ALL,
    AUTHENTICATION = CERTIFICATE [AGCert],
    ENCRYPTION = REQUIRED ALGORITHM AES -- More efficient than RC4
);

-- Verify endpoint is using optimal encryption
SELECT name, encryption_algorithm_desc 
FROM sys.database_mirroring_endpoints;

Implement AG replica load balancing for read workloads

-- Configure readable secondary replicas to reduce primary transport load
-- This redistributes connection load and reduces transport session pressure
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'SecondaryServer'
WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));

-- Enable read-only routing to automatically distribute read traffic
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'PrimaryServer'
WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST = ('SecondaryServer')));

Optimize transaction log shipping batch sizes

-- Reduce frequency of transport session access by batching log records
-- Requires SQL Server 2016 or later for optimal effectiveness
EXEC sp_configure 'max degree of parallelism', 4; -- Reduce parallel operations load
RECONFIGURE;

-- Monitor the impact on log send queue efficiency
SELECT 
    database_name,
    log_send_queue_size,
    log_send_rate,
    redo_queue_size 
FROM sys.dm_hadr_database_replica_states 
WHERE is_local = 0;

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

Prevention

Configure Always On endpoints with dedicated network adapters to eliminate NIC contention that causes transport session delays. Use 10Gbps or faster connections between data centers with dedicated bandwidth allocation for Always On traffic.

Monitor network latency between replicas continuously, maintaining sub-5ms latency for synchronous replicas. Implement network QoS policies that prioritize Always On traffic over other database connections.

Place secondary replicas geographically close to the primary when using synchronous commit mode. Asynchronous commit mode better tolerates higher latency connections but still benefits from optimized network paths.

Configure SSL certificates with appropriate key sizes, 2048-bit RSA certificates provide the best balance of security and performance. Avoid 4096-bit certificates unless regulatory requirements mandate them, as the additional computational overhead significantly increases transport session wait times.

Schedule transaction log backups more frequently on high-volume OLTP systems to prevent excessive log growth that overwhelms the transport layer. Implement read-only routing lists to distribute read workloads away from the primary replica, reducing overall transport session utilization.

Need hands-on help?

Dealing with persistent hadr_transport_session issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.

Related Pages