Quick Answer
BROKER_ENDPOINT_STATE_MUTEX occurs when multiple sessions compete to modify Service Broker endpoint connection state simultaneously. This wait indicates contention on the internal mutex protecting endpoint metadata changes during connection establishment, termination, or state transitions. Moderate levels are normal during active Service Broker operations, but sustained high waits suggest endpoint connection churn or configuration issues.
Root Cause Analysis
Service Broker endpoints maintain connection state information in shared memory structures that require serialized access for modifications. The BROKER_ENDPOINT_STATE_MUTEX protects these critical sections during endpoint lifecycle operations including TCP connection establishment, security context validation, certificate exchanges, and connection teardown.
SQL Server's Service Broker architecture uses a single mutex per endpoint to serialize state changes. When multiple conversations attempt simultaneous operations on the same endpoint, threads queue on this mutex. The contention intensifies during certificate rotation, endpoint reconfiguration, or when applications rapidly establish and drop Service Broker connections.
In SQL Server 2016 and later, the endpoint state management improved with more granular locking for certain operations, but the core mutex remains for state transitions. SQL Server 2019 introduced better connection pooling for Service Broker, reducing some endpoint churn. SQL Server 2022 enhanced the endpoint security context caching, which can reduce mutex contention during authentication phases.
The wait specifically manifests during these internal operations: endpoint state validation in sys.service_broker_endpoints, TCP listener state changes, connection pool management, and security context establishment. High waits typically correlate with frequent endpoint restarts, certificate issues causing connection retries, or applications not properly reusing Service Broker connections.
AutoDBA checks Service Broker endpoint configuration, certificate expiration monitoring, and connection pool optimization across your entire SQL Server instance in 60 seconds. Download the free diagnostic script and see what else needs attention.
Diagnostic Queries
-- Current Service Broker endpoint states and connections
SELECT
e.name,
e.state_desc,
te.port,
ec.connection_id,
ec.connect_time,
ec.login_state_desc,
ec.peer_certificate_id
FROM sys.endpoints e
JOIN sys.service_broker_endpoints sbe ON e.endpoint_id = sbe.endpoint_id
LEFT JOIN sys.tcp_endpoints te ON e.endpoint_id = te.endpoint_id
LEFT JOIN sys.dm_broker_connections ec ON ec.state_desc != 'DISCONNECTED'
WHERE e.type_desc = 'SERVICE_BROKER'
ORDER BY e.name;
-- Service Broker wait statistics focused on endpoint operations
SELECT
wait_type,
waiting_tasks_count,
wait_time_ms,
max_wait_time_ms,
signal_wait_time_ms,
wait_time_ms / NULLIF(waiting_tasks_count, 0) as avg_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'BROKER%'
AND wait_type IN ('BROKER_ENDPOINT_STATE_MUTEX', 'BROKER_CONNECTION_RECEIVE_TASK', 'BROKER_TO_FLUSH')
ORDER BY wait_time_ms DESC;
-- Active Service Broker conversations and their endpoint usage
SELECT
c.conversation_handle,
c.state_desc as conversation_state,
c.far_service,
s.name as service_name,
c.far_broker_instance
FROM sys.conversation_endpoints c
INNER JOIN sys.services s ON c.service_id = s.service_id
WHERE c.state NOT IN ('CLOSED', 'ERROR')
ORDER BY c.conversation_handle;
-- Service Broker error log entries related to endpoints
SELECT
login_time,
spid,
status,
hostname,
program_name,
cmd
FROM sys.dm_exec_sessions
WHERE program_name LIKE 'Service Broker%'
OR cmd LIKE '%BROKER%';
-- Certificate and security context information for Service Broker
SELECT
c.name as certificate_name,
c.subject,
c.start_date,
c.expiry_date,
c.thumbprint,
CASE WHEN c.expiry_date < DATEADD(day, 30, GETDATE()) THEN 'EXPIRING_SOON' ELSE 'VALID' END as status
FROM sys.certificates c
WHERE c.name LIKE '%ServiceBroker%'
OR EXISTS (SELECT 1 FROM sys.service_broker_endpoints sbe WHERE sbe.certificate_id = c.certificate_id);
Fix Scripts
Restart problematic Service Broker endpoints to clear mutex contention
-- Identify and restart endpoints with state issues
-- Test in development first - this will drop active connections
DECLARE @endpoint_name NVARCHAR(128);
DECLARE endpoint_cursor CURSOR FOR
SELECT e.name FROM sys.endpoints e
JOIN sys.service_broker_endpoints sbe ON e.endpoint_id = sbe.endpoint_id
WHERE e.state_desc != 'STARTED';
OPEN endpoint_cursor;
FETCH NEXT FROM endpoint_cursor INTO @endpoint_name;
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @sql NVARCHAR(500) = N'ALTER ENDPOINT ' + QUOTENAME(@endpoint_name) + N' STATE = STOPPED';
EXEC sp_executesql @sql;
WAITFOR DELAY '00:00:02'; -- Brief pause
SET @sql = N'ALTER ENDPOINT ' + QUOTENAME(@endpoint_name) + N' STATE = STARTED';
EXEC sp_executesql @sql;
FETCH NEXT FROM endpoint_cursor INTO @endpoint_name;
END;
CLOSE endpoint_cursor;
DEALLOCATE endpoint_cursor;
Force cleanup of stale Service Broker connections
-- End conversations that may be holding endpoint resources
-- This will terminate conversations - ensure business logic can handle this
ALTER DATABASE [YourDatabase] SET ENABLE_BROKER;
-- End ERROR state conversations older than 1 hour
DECLARE @conversation_handle UNIQUEIDENTIFIER;
DECLARE conversation_cursor CURSOR FOR
SELECT conversation_handle
FROM sys.conversation_endpoints
WHERE state_desc = 'ERROR'
AND DATEDIFF(hour, state_change_date, GETDATE()) > 1;
OPEN conversation_cursor;
FETCH NEXT FROM conversation_cursor INTO @conversation_handle;
WHILE @@FETCH_STATUS = 0
BEGIN
END CONVERSATION @conversation_handle WITH CLEANUP;
FETCH NEXT FROM conversation_cursor INTO @conversation_handle;
END;
CLOSE conversation_cursor;
DEALLOCATE conversation_cursor;
Update Service Broker endpoint connection limits
-- Increase connection limits if endpoint saturation is causing contention
-- Adjust max_connections based on your workload requirements
ALTER ENDPOINT YourServiceBrokerEndpoint
FOR SERVICE_BROKER (
MESSAGE_FORWARDING = ENABLED,
MESSAGE_FORWARD_SIZE = 10
);
-- Create broker priority to manage connection handling (SQL Server 2008+)
-- This helps prioritize Service Broker conversations
CREATE BROKER PRIORITY HighPriorityConversation
FOR CONVERSATION
SET (CONTRACT_NAME = [YourContract],
LOCAL_SERVICE_NAME = [YourLocalService],
PRIORITY_LEVEL = 10);
Certificate renewal to prevent authentication-related endpoint contention
-- Create new certificate before current expires
-- Replace paths and passwords with your actual values
CREATE CERTIFICATE NewServiceBrokerCert
FROM FILE = 'C:\Certificates\NewServiceBroker.cer'
WITH PRIVATE KEY (
FILE = 'C:\Certificates\NewServiceBroker.key',
DECRYPTION BY PASSWORD = 'YourStrongPassword'
);
-- Update endpoint to use new certificate
-- Test thoroughly before implementing in production
ALTER ENDPOINT YourServiceBrokerEndpoint
FOR SERVICE_BROKER (
AUTHENTICATION = CERTIFICATE NewServiceBrokerCert
);
-- Drop old certificate after confirming new one works
-- DROP CERTIFICATE OldServiceBrokerCert;
AutoDBA generates fix scripts like these automatically, with impact estimates and rollback SQL included.
Prevention
Configure Service Broker applications to reuse connections rather than establishing new ones for each conversation. Implement connection pooling patterns in application code to minimize endpoint state transitions.
Monitor certificate expiration dates and establish automated renewal processes. Expiring certificates cause authentication failures that trigger rapid connection retry attempts, intensifying mutex contention.
Set appropriate MESSAGE_FORWARD_SIZE and connection limits on Service Broker endpoints based on actual workload requirements. Undersized limits force connection cycling that increases endpoint state changes.
Implement monitoring for BROKER_ENDPOINT_STATE_MUTEX waits using custom alerts when wait times exceed baseline thresholds. Track correlation with Service Broker error log entries and conversation endpoint state changes.
Avoid frequent ALTER ENDPOINT operations during business hours. Schedule endpoint configuration changes during maintenance windows to prevent unnecessary state transitions.
Consider partitioning high-volume Service Broker workloads across multiple endpoints when single-endpoint mutex contention becomes a bottleneck. Design conversation routing to distribute load evenly across endpoints.
Need hands-on help?
Dealing with persistent broker_endpoint_state_mutex issues across your environment? Samix Technology provides hands-on SQL Server performance consulting with 15+ years of production DBA experience.