mediumAlways On

HADR_GROUP_COMMIT Wait Type Explained

Complete guide to SQL Server HADR_GROUP_COMMIT waits in Always On Availability Groups. Learn root causes, diagnostic queries, and optimization techniques.

Quick Answer

HADR_GROUP_COMMIT occurs when Always On Availability Groups batch multiple transaction commits into a single log block to optimize I/O performance. This wait is normal and expected in healthy Always On environments, indicating the group commit optimization is working correctly.

Root Cause Analysis

The Always On log transport service batches transaction commits using a group commit mechanism to maximize throughput and minimize network overhead between replicas. When a transaction commits on the primary replica, SQL Server's log manager doesn't immediately flush the log record. Instead, it waits briefly to collect additional commits that can be grouped together into a single log block.

The wait occurs in the log writer thread when it delays processing to allow more transactions to accumulate. The default group commit delay is 0 milliseconds, but the system dynamically adjusts this based on transaction volume and network conditions. The maximum group size is determined by the log block size (typically 512 bytes to 64KB).

SQL Server 2016 introduced adaptive group commit timing, which automatically tunes the delay based on replica lag and throughput patterns. SQL Server 2019 improved the batching logic with better throughput optimization. SQL Server 2022 further refined the transport protocol with enhanced compression and multiplexing capabilities.

The wait manifests in sys.dm_os_wait_stats but should correlate with improved log send rates visible in sys.dm_hadr_database_replica_states. High wait times paired with low log_send_rate values indicate network or secondary replica performance issues preventing effective batching.

AutoDBA checks Always On configuration, replica synchronization modes, and session timeout settings across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.

Diagnostic Queries

-- Check current HADR group commit wait statistics
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms,
    CASE WHEN waiting_tasks_count > 0 
         THEN wait_time_ms / waiting_tasks_count 
         ELSE 0 END AS avg_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_GROUP_COMMIT';
-- Analyze log send performance and batching effectiveness
SELECT 
    ag.name AS availability_group_name,
    db.database_name,
    ar.replica_server_name,
    drs.log_send_queue_size,
    drs.log_send_rate,
    drs.redo_queue_size,
    drs.redo_rate,
    drs.last_commit_time,
    DATEDIFF(ms, drs.last_commit_time, GETDATE()) AS ms_since_last_commit
FROM sys.dm_hadr_database_replica_states drs
INNER JOIN sys.availability_databases_cluster db ON drs.database_id = db.database_id
INNER JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
INNER JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
WHERE drs.is_local = 0
ORDER BY drs.log_send_queue_size DESC;
-- Monitor transaction log flush behavior and group commit patterns
SELECT 
    database_name,
    log_flush_wait_time,
    log_flushes/sec,
    log_flush_waits/sec,
    CASE WHEN [log_flushes/sec] > 0 
         THEN log_flush_wait_time / [log_flushes/sec] 
         ELSE 0 END AS avg_flush_wait_ms
FROM (
    SELECT 
        RTRIM(instance_name) AS database_name,
        SUM(CASE WHEN counter_name = 'Log Flush Wait Time' THEN cntr_value ELSE 0 END) AS log_flush_wait_time,
        SUM(CASE WHEN counter_name = 'Log Flushes/sec' THEN cntr_value ELSE 0 END) AS [log_flushes/sec],
        SUM(CASE WHEN counter_name = 'Log Flush Waits/sec' THEN cntr_value ELSE 0 END) AS [log_flush_waits/sec]
    FROM sys.dm_os_performance_counters 
    WHERE object_name LIKE '%Databases%'
        AND counter_name IN ('Log Flush Wait Time', 'Log Flushes/sec', 'Log Flush Waits/sec')
        AND instance_name IN (SELECT name FROM sys.databases WHERE replica_id IS NOT NULL)
    GROUP BY instance_name
) AS log_stats
WHERE database_name != '_Total';
-- Check Always On configuration affecting group commit behavior
SELECT 
    ag.name AS availability_group_name,
    ar.replica_server_name,
    ar.availability_mode_desc,
    ar.failover_mode_desc,
    ar.session_timeout,
    adc.database_name,
    adc.synchronization_health_desc,
    adc.synchronization_state_desc
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
INNER JOIN sys.availability_databases_cluster adc ON ag.group_id = adc.group_id
WHERE ar.is_local = 1 OR ar.secondary_role_allow_connections > 0
ORDER BY ag.name, ar.replica_server_name;
-- Real-time monitoring of group commit delays and transaction throughput
SELECT 
    session_id,
    wait_type,
    wait_time,
    blocking_session_id,
    wait_resource,
    command,
    database_id,
    DB_NAME(database_id) AS database_name
FROM sys.dm_exec_requests 
WHERE wait_type = 'HADR_GROUP_COMMIT'
    AND session_id > 50;

Fix Scripts

Optimize Always On synchronization mode for workload patterns

-- Switch to asynchronous commit if group commit waits are excessive
-- WARNING: Test thoroughly in dev. This reduces durability guarantees.
USE master;
GO
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'SecondaryServerName' 
WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);
GO
-- Monitor for 24-48 hours to assess impact on RPO requirements

Adjust session timeout to reduce premature group commit flushes

-- Increase session timeout to allow more time for batching
-- This reduces false timeouts that force premature log flushes
USE master;
GO
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'SecondaryServerName'
WITH (SESSION_TIMEOUT = 20); -- Default is 10 seconds, increasing to 20
GO
-- Monitor sys.dm_hadr_availability_replica_states for timeout events

Configure backup and seeding optimization

-- Reduce log transport overhead by optimizing backup and seeding settings
-- This indirectly improves group commit effectiveness
ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'SecondaryServerName'
WITH (BACKUP_PRIORITY = 90, -- Prefer this replica for backups
      SEEDING_MODE = AUTOMATIC); -- SQL Server 2016+
GO

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

Prevention

Configure Always On replicas with appropriate synchronization modes based on network latency and throughput requirements. Asynchronous commit mode reduces group commit pressure for geographically distributed replicas where network latency exceeds 5ms consistently.

Implement log file sizing strategies that minimize autogrowth events during peak transaction periods. Pre-size transaction logs to handle 2-4 hours of peak transaction volume, reducing I/O contention that can disrupt group commit timing.

Monitor sys.dm_hadr_database_replica_states.log_send_rate weekly and establish baselines. Declining send rates often precede increased HADR_GROUP_COMMIT waits as the system compensates for transport inefficiencies through extended batching windows.

Deploy secondary replicas on storage subsystems with write latencies under 10ms. Higher latencies force longer group commit delays as the system waits for redo confirmation before optimizing the next batch window.

Configure availability group session timeouts based on measured network characteristics rather than defaults. Networks with consistent sub-millisecond latency can use 5-7 second timeouts, while variable latency networks require 15-20 seconds to prevent false timeout cascades.

Need hands-on help?

Dealing with persistent hadr_group_commit issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.

Related Pages