# Connectivity issues in Azure SQL Database Before we jump into troubleshooting, we should clarify the following points first: - It's **connectivity** error or **authentication** error - The connection **always fails** or it's **intermittently failed**. - Understand the network topology. Is the client hosted in Azure, is the client behind a firewall, or routed through a 3rd-party VPN or proxy? ## Troubleshooting steps ### Determine error types We will be able to quickly identify it's a service related issue or network related issue by checking the error messages and code, for instance error `40197` or `40613` usually indciates a service issue and Error `18456` and `40531` usually indicates an authenticaiton issue. [Troubleshoot Common Connection Issues - Azure SQL & SQL database in Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-errors-issues?view=azuresql#transient-fault-error-messages-40197-40613-and-others) ### Permanent network-related error For network-related errors or instance-specific error, we can leverage network utilities to investigate further. Regarding timeouts errors we should first verify whether the TCP traffic is allowed or not: - telnet {host} {port} - nc -vz {host} {port} - nmap -p {port} {host} Windows featured - Test-NetConnection -computer myserver.database.windows.net -port 1433 -InformationLevel Detailed - [PsPing.exe](https://learn.microsoft.com/en-us/sysinternals/downloads/PsPing) mysqldbsrvr.database.windows.net:1433 Azure SQL Database use a wide range of port for Redirect connection policy: 11000 to 11999 for public endpoint, 1433 to 65535 for private endpoint, plus 1433 for gateway. It's common to see connection timeouts due to this port range not allowed properly: [Connectivity architecture ](https://learn.microsoft.com/en-us/azure/azure-sql/database/connectivity-architecture?view=azuresql#connection-policy) When using private endpoint, it's important to verify if DNS resoluted to the private IP properly : - nslookup {domain} We can leverage PowerShell to verify using specific DNS server: - Resolve-DnsName -Name www.bing.com -Server 10.0.0.1 #### Advanced tools [GitHub - Azure/SQL-Connectivity-Checker: This PowerShell script will run some connectivity checks from this machine to the server and database.](https://github.com/Azure/SQL-Connectivity-Checker) This checked is super useful, it can be executed both on Windows and Linux OS. Also it can collect network trace to analyze further. #### Useful links - 64 Error [Lesson Learned #433:Azure SQL Pre-login Handshake Error 64:"The specified network name is no longer"](https://techcommunity.microsoft.com/blog/azuredbsupport/lesson-learned-433azure-sql-pre-login-handshake-error-64the-specified-network-na/3936302): This error suggests that although a connection to the SQL Server was initially successful, it failed during the pre-login handshake. This phase is vital in the SQL Server connection process, and failures here commonly indicate network-related issues, incorrect configurations, or software incompatibilities. ### Intermittent connection drops/termination errors Given the nature of cloud environment, 100% network stability cannot be guaranteed and a few transient faults are inevitable. Regarding intermittent connectivity issues with SQL Database, there are 2 patterns; cannot establish connection - login failures; exiting conneciton drops - abrupt disconnections. Java SDK throws connection reset exceptions, and .NET SDK throws SqlExceptions like the [following termination errors](https://learn.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-errors-issues?view=azuresql#network-connection-termination-errors). Those are client-side TCP provider errors (SQL error number = 0) raised when the network stack detects that an already-open TCP socket was closed/reset unexpectedly by the peer (or by an intermediate device): - A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) An existing connection was forcibly closed by the remote host → drop happened before TDS login, while exchanging PreLogin + TLS parameters. - A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) →drop happened after TLS was up and the driver was sending LOGIN7 / waiting for LoginAck - A connection was successfully established with the server, but then an error occurred during the login process. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) → drop happened after login, during query/request I/O. Often seen when result sets are large or the network path is unstable. > Why error number 0? Because these are not SQL engine failures; they originate in the TCP provider (Winsock/.NET or Java sockets), so there’s no database error code to report. Above intermitent termination errors are generally a symptom of network instability and can be caused by any hops involved in the TCP connection establishment process. Think of the TCP connection as spanning multiple hops (client → DNS → gateway → DB node, plus any firewalls/LBs/VPN/proxies). A reset at any hop will surface to the SDK as “forcibly closed.” - The target is temporarily unavailable or too busy to accept new connections. - Corporate firewalls, SSL/TLS inspection, NAT, proxies, or VPN concentrators may inject resets or timeouts under load or policy (idle timeout, connection reaping, packet-size anomalies). - If TLS can’t be established quickly (network latency, inspection devices, certificate negotiation issues), you’ll see pre‑login handshake failures or timeouts. In terms of Azure SQL Database, it's easy to figure out failures caused by database maintenance. The other one impactful but hard to notice is SQL Gateway maintenance, as it's completely invisible to customers: https://learn.microsoft.com/en-us/azure/azure-sql/database/maintenance-window?view=azuresql#gateway-maintenance > In Azure SQL Database, any connections using the proxy connection policy could be affected by both the chosen maintenance window and a gateway node maintenance window. However, client connections using the recommended redirect connection policy are unaffected by a gateway node maintenance reconfiguration. SQL Gateway is a regional shared resource for database service. The gateway restarts are part of the maintenance of the service. During the maintenance most of gateway nodes remain operational. For nodes undergoing maintenance, the primary impact is a brief instance start, typically for seconds. Using Redirect connection policy is an effective approach to mitigate the impact from the SQL Gateway interruptions. However, tracing all intermittent network faults can be challenging, as it requires capturing a network trace while the issue is still occurring. As a general guidance, applications that interact with cloud-based services should use a fixed number of retry attempts to handle such exceptions transparently. Retry guidance from Architecture Center: - [Transient fault handling - Azure Architecture Center | Microsoft Learn](https://learn.microsoft.com/en-us/azure/architecture/best-practices/transient-faults) - [Retry pattern - Azure Architecture Center | Microsoft Learn](https://learn.microsoft.com/en-us/azure/architecture/patterns/retry#example) Retry coding: - ADO.NET Configurable retry logic: [Configurable retry logic core APIs in SqlClient - ADO.NET Provider for SQL Server | Microsoft Learn](https://learn.microsoft.com/en-us/sql/connect/ado-net/configurable-retry-logic-core-apis-sqlclient?view=sql-server-ver17) - ADO.NET code sample that demonstrates custom retry logic: [Step 4: Connect Resiliently to SQL with ADO.NET - ADO.NET Provider for SQL Server | Microsoft Learn](https://learn.microsoft.com/en-us/sql/connect/ado-net/step-4-connect-resiliently-sql-ado-net?view=sql-server-ver17) ### A few edge cases - Login failed due to schema lock wait timeout.