mediumAlways On

HADR_DATABASE_WAIT_FOR_RESTART Wait Type Explained

Fix SQL Server HADR_DATABASE_WAIT_FOR_RESTART waits in Always On Availability Groups. Diagnostic queries, root cause analysis, and production-tested solutions.

Quick Answer

HADR_DATABASE_WAIT_FOR_RESTART occurs when an Availability Group database waits to complete restart operations after failover, database addition, or configuration changes. This wait is expected during normal AG operations and typically resolves automatically within seconds to minutes. It becomes concerning only when waits persist beyond expected restart timeframes.

Root Cause Analysis

This wait type manifests during Availability Group database state transitions when the SQL Server Database Engine must coordinate database restart operations with the Windows Server Failover Cluster (WSFC) and AG resource controllers. The wait occurs in the database startup thread while the AG infrastructure validates database eligibility, establishes log transport connections, and synchronizes metadata between replicas.

The underlying mechanism involves the AG Database Manager component waiting for the Database Engine to complete internal database recovery processes before transitioning the database to an operational state. This includes log redo operations, checkpoint processing, and establishing appropriate database states (PRIMARY, SECONDARY, RESOLVING, or NOT SYNCHRONIZING).

SQL Server 2012 introduced this wait type with basic AG functionality. SQL Server 2016 enhanced the restart logic to better handle scenarios with multiple databases in the same AG. SQL Server 2017 and later versions improved timeout handling and reduced unnecessary restart delays during planned operations like manual failovers.

The wait duration correlates directly with database size, transaction log activity, and underlying storage performance. Databases with large amounts of uncommitted transactions or extensive log chains require longer restart periods. The AG lease timeout mechanism (default 20 seconds) can also influence restart behavior when cluster communication issues occur.

During restart operations, the database remains inaccessible to user connections while maintaining its AG membership. The Database Engine blocks new connections and terminates existing sessions during the transition period.

AutoDBA checks Availability Group health monitoring, database synchronization states, and cluster connectivity metrics across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.

Diagnostic Queries

-- Check current AG database states and restart operations
SELECT 
    ag.name AS availability_group,
    adc.database_name,
    ars.role_desc,
    drs.synchronization_state_desc,
    ars.connected_state_desc,
    ars.last_connect_error_description,
    ars.last_connect_error_timestamp
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.availability_databases_cluster adc ON ag.group_id = adc.group_id
JOIN sys.dm_hadr_database_replica_states drs ON ar.replica_id = drs.replica_id
WHERE ars.is_local = 1;
-- Monitor HADR_DATABASE_WAIT_FOR_RESTART waits specifically
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_DATABASE_WAIT_FOR_RESTART';
-- Identify databases currently in restart states
SELECT 
    d.name AS database_name,
    d.state_desc AS database_state,
    drs.database_state_desc AS ag_database_state,
    drs.suspend_reason_desc,
    drs.recovery_lsn,
    drs.last_commit_time
FROM sys.databases d
JOIN sys.dm_hadr_database_replica_states drs ON d.database_id = drs.database_id
WHERE drs.is_local = 1 
    AND (d.state_desc IN ('RESTORING', 'RECOVERING') 
         OR drs.database_state_desc IN ('NOT SYNCHRONIZING', 'INITIALIZING'));
-- Check for cluster connectivity issues affecting restarts
SELECT 
    c.quorum_type_desc,
    c.quorum_state_desc,
    m.member_name,
    m.member_type_desc,
    m.member_state_desc
FROM sys.dm_hadr_cluster c
CROSS JOIN sys.dm_hadr_cluster_members m
WHERE m.member_state_desc != 'UP';
-- Review recent AG failover events and database state changes
SELECT TOP 20
    object_name,
    counter_name,
    instance_name,
    cntr_value,
    cntr_type
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Availability Replica%'
    AND counter_name IN ('Failovers/sec', 'Connection Failures/sec', 'Time To Receive Response (ms)')
    AND instance_name != '';

Fix Scripts

-- Force database online if stuck in restart loop (use with extreme caution)
-- WARNING: Test in development first. Can cause data loss if used inappropriately.
-- Only use when database is confirmed healthy and restart is genuinely stuck.

USE master;
GO

DECLARE @DatabaseName NVARCHAR(128) = 'YourAGDatabase';
DECLARE @SQL NVARCHAR(MAX);

-- First attempt graceful restart
SET @SQL = 'ALTER DATABASE [' + @DatabaseName + '] SET ONLINE;';
PRINT 'Executing: ' + @SQL;
EXEC sp_executesql @SQL;

-- Check if database came online
IF EXISTS (SELECT 1 FROM sys.databases WHERE name = @DatabaseName AND state = 0)
    PRINT 'Database successfully brought online.';
ELSE
    PRINT 'Manual intervention required. Database did not respond to ONLINE command.';
-- Remove and re-add database to AG if restart issues persist
-- WARNING: This will cause brief downtime and force full data synchronization
-- Ensure secondary replicas can handle the resync load

USE master;
GO

DECLARE @AGName NVARCHAR(128) = 'YourAvailabilityGroup';
DECLARE @DatabaseName NVARCHAR(128) = 'YourDatabase';

-- Remove database from AG
EXEC('ALTER AVAILABILITY GROUP [' + @AGName + '] REMOVE DATABASE [' + @DatabaseName + '];');
PRINT 'Database removed from AG. Waiting 10 seconds...';
WAITFOR DELAY '00:00:10';

-- Re-add database to AG
EXEC('ALTER AVAILABILITY GROUP [' + @AGName + '] ADD DATABASE [' + @DatabaseName + '];');
PRINT 'Database re-added to AG. Monitor synchronization status.';
-- Adjust AG lease timeout if cluster communication is causing restart delays
-- Increases timeout from default 20 seconds to 40 seconds
-- WARNING: Longer timeouts delay failover detection but reduce false positives

USE master;
GO

DECLARE @AGName NVARCHAR(128) = 'YourAvailabilityGroup';

ALTER AVAILABILITY GROUP YourAvailabilityGroup
MODIFY REPLICA ON 'PrimaryReplicaName' 
WITH (SESSION_TIMEOUT = 40);

ALTER AVAILABILITY GROUP YourAvailabilityGroup  
MODIFY REPLICA ON 'SecondaryReplicaName'
WITH (SESSION_TIMEOUT = 40);

PRINT 'AG session timeout increased to 40 seconds on all replicas.';
-- Clear accumulated wait statistics to get fresh baseline after resolving issues
-- Safe to run in production, only affects wait statistics collection

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
PRINT 'Wait statistics cleared. Monitor for recurring HADR_DATABASE_WAIT_FOR_RESTART waits.';

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

Prevention

Configure AG databases with appropriate recovery models and ensure transaction log files have adequate space and proper growth settings. Databases in SIMPLE recovery model cannot participate in Availability Groups and will generate restart errors.

Implement proactive monitoring of AG dashboard metrics, specifically database synchronization states and failover readiness indicators. Set up alerts for databases stuck in RESOLVING or NOT SYNCHRONIZING states for more than 5 minutes.

Optimize storage subsystem performance for AG database files and logs. Slow storage directly correlates with extended restart durations. Use separate drives for data and log files when possible, and ensure consistent performance across all AG replicas.

Configure cluster heartbeat and session timeout values appropriately for your network environment. Networks with higher latency or intermittent connectivity issues benefit from increased timeout values (30-60 seconds) to prevent unnecessary restart operations.

Schedule AG maintenance operations during planned downtime windows. Avoid making multiple configuration changes simultaneously, as this can trigger cascading restart operations across multiple databases.

Establish baseline performance metrics for normal AG restart operations in your environment. Document typical restart durations for each database to identify when waits exceed expected thresholds. Most healthy databases complete restarts within 30-120 seconds depending on size and activity levels.

Need hands-on help?

Dealing with persistent hadr_database_wait_for_restart issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.

Related Pages