Quick Answer

HADR_DATABASE_WAIT_FOR_RESTART occurs when an Availability Group database waits to complete restart operations after failover, database addition, or configuration changes. This wait is expected during normal AG operations and typically resolves automatically within seconds to minutes. It becomes concerning only when waits persist beyond expected restart timeframes.

Root Cause Analysis

This wait type manifests during Availability Group database state transitions when the SQL Server Database Engine must coordinate database restart operations with the Windows Server Failover Cluster (WSFC) and AG resource controllers. The wait occurs in the database startup thread while the AG infrastructure validates database eligibility, establishes log transport connections, and synchronizes metadata between replicas.

The underlying mechanism involves the AG Database Manager component waiting for the Database Engine to complete internal database recovery processes before transitioning the database to an operational state. This includes log redo operations, checkpoint processing, and establishing appropriate database states (PRIMARY, SECONDARY, RESOLVING, or NOT SYNCHRONIZING).

SQL Server 2012 introduced this wait type with basic AG functionality. SQL Server 2016 enhanced the restart logic to better handle scenarios with multiple databases in the same AG. SQL Server 2017 and later versions improved timeout handling and reduced unnecessary restart delays during planned operations like manual failovers.

The wait duration correlates directly with database size, transaction log activity, and underlying storage performance. Databases with large amounts of uncommitted transactions or extensive log chains require longer restart periods. The AG lease timeout mechanism (default 20 seconds) can also influence restart behavior when cluster communication issues occur.

During restart operations, the database remains inaccessible to user connections while maintaining its AG membership. The Database Engine blocks new connections and terminates existing sessions during the transition period.

Diagnostic Queries

-- Check current AG database states and restart operations
SELECT 
    ag.name AS availability_group,
    adc.database_name,
    ars.role_desc,
    drs.synchronization_state_desc,
    ars.connected_state_desc,
    ars.last_connect_error_description,
    ars.last_connect_error_timestamp
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.availability_databases_cluster adc ON ag.group_id = adc.group_id
JOIN sys.dm_hadr_database_replica_states drs ON ar.replica_id = drs.replica_id
WHERE ars.is_local = 1;

-- Monitor HADR_DATABASE_WAIT_FOR_RESTART waits specifically
SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms,
    max_wait_time_ms,
    signal_wait_time_ms,
    wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_time_ms
FROM sys.dm_os_wait_stats 
WHERE wait_type = 'HADR_DATABASE_WAIT_FOR_RESTART';

-- Identify databases currently in restart states
SELECT 
    d.name AS database_name,
    d.state_desc AS database_state,
    drs.database_state_desc AS ag_database_state,
    drs.suspend_reason_desc,
    drs.recovery_lsn,
    drs.last_commit_time
FROM sys.databases d
JOIN sys.dm_hadr_database_replica_states drs ON d.database_id = drs.database_id
WHERE drs.is_local = 1 
    AND (d.state_desc IN ('RESTORING', 'RECOVERING') 
         OR drs.database_state_desc IN ('NOT SYNCHRONIZING', 'INITIALIZING'));

-- Check for cluster connectivity issues affecting restarts
SELECT 
    c.quorum_type_desc,
    c.quorum_state_desc,
    m.member_name,
    m.member_type_desc,
    m.member_state_desc
FROM sys.dm_hadr_cluster c
CROSS JOIN sys.dm_hadr_cluster_members m
WHERE m.member_state_desc != 'UP';

-- Review recent AG failover events and database state changes
SELECT TOP 20
    object_name,
    counter_name,
    instance_name,
    cntr_value,
    cntr_type
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Availability Replica%'
    AND counter_name IN ('Failovers/sec', 'Connection Failures/sec', 'Time To Receive Response (ms)')
    AND instance_name != '';

Fix Scripts

-- Force database online if stuck in restart loop (use with extreme caution)
-- WARNING: Test in development first. Can cause data loss if used inappropriately.
-- Only use when database is confirmed healthy and restart is genuinely stuck.

USE master;
GO

DECLARE @DatabaseName NVARCHAR(128) = 'YourAGDatabase';
DECLARE @SQL NVARCHAR(MAX);

-- First attempt graceful restart
SET @SQL = 'ALTER DATABASE [' + @DatabaseName + '] SET ONLINE;';
PRINT 'Executing: ' + @SQL;
EXEC sp_executesql @SQL;

-- Check if database came online
IF EXISTS (SELECT 1 FROM sys.databases WHERE name = @DatabaseName AND state = 0)
    PRINT 'Database successfully brought online.';
ELSE
    PRINT 'Manual intervention required. Database did not respond to ONLINE command.';

-- Remove and re-add database to AG if restart issues persist
-- WARNING: This will cause brief downtime and force full data synchronization
-- Ensure secondary replicas can handle the resync load

USE master;
GO

DECLARE @AGName NVARCHAR(128) = 'YourAvailabilityGroup';
DECLARE @DatabaseName NVARCHAR(128) = 'YourDatabase';

-- Remove database from AG
EXEC('ALTER AVAILABILITY GROUP [' + @AGName + '] REMOVE DATABASE [' + @DatabaseName + '];');
PRINT 'Database removed from AG. Waiting 10 seconds...';
WAITFOR DELAY '00:00:10';

-- Re-add database to AG
EXEC('ALTER AVAILABILITY GROUP [' + @AGName + '] ADD DATABASE [' + @DatabaseName + '];');
PRINT 'Database re-added to AG. Monitor synchronization status.';

-- Adjust AG lease timeout if cluster communication is causing restart delays
-- Increases timeout from default 20 seconds to 40 seconds
-- WARNING: Longer timeouts delay failover detection but reduce false positives

USE master;
GO

DECLARE @AGName NVARCHAR(128) = 'YourAvailabilityGroup';

ALTER AVAILABILITY GROUP YourAvailabilityGroup
MODIFY REPLICA ON 'PrimaryReplicaName' 
WITH (SESSION_TIMEOUT = 40);

ALTER AVAILABILITY GROUP YourAvailabilityGroup  
MODIFY REPLICA ON 'SecondaryReplicaName'
WITH (SESSION_TIMEOUT = 40);

PRINT 'AG session timeout increased to 40 seconds on all replicas.';

-- Clear accumulated wait statistics to get fresh baseline after resolving issues
-- Safe to run in production, only affects wait statistics collection

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
PRINT 'Wait statistics cleared. Monitor for recurring HADR_DATABASE_WAIT_FOR_RESTART waits.';

AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.

HADR_DATABASE_WAIT_FOR_RESTART Wait Type Explained

Quick Answer

Root Cause Analysis

Diagnostic Queries

Fix Scripts

Prevention

Need hands-on help?

Related Pages

HADR_AG_MUTEX

HADR_AR_CRITICAL_SECTION_ENTRY

HADR_BACKUP_BULK_LOCK

Always On AG Latency

sys.dm_hadr_availability_replica_states

HADR_DATABASE_WAIT_FOR_RECOVERY