Is "AlwaysOn" not always "Always On?"

You have a bunch of different questions in here.

Q: What is the "Always On" thing?

Microsoft uses that brand name (which was written without a space before 2016) to describe two different features:

  • Failover Clustered Instances (FCIs) - what your grandpa used to call an active/passive cluster
  • Availability Groups (AGs) - like database mirroring, but works with groups of databases in some cases (but not the system databases)

Use those terms to describe which specific Always On feature you're using.

Q: In a failover, will it be always on?

Neither FCIs nor AGs are really always on. During a failover, your running transactions will fail, and connection retries can fail for 5-60 seconds (or more). It's up to you to build in graceful retry logic in your applications, or build in degraded capability tools like Stack Overflow does.

Q: How do I configure Always On?

It varies dramatically based on:

  • Which AO feature you're using (FCIs or AGs)
  • The number of nodes in the cluster
  • How you want to handle quorum (voting)
  • Whether you're using automatic failover via a listener or virtual computer name

These are big decisions that involve a lot of architecture work. For more detailed specifics, include the above details, and we'll be able to tell you more about how to configure it.

Q: Isn't it just a matter of checking the box for Always On?

Nope.


You might be confusing "Always ON" AGs (Availability Groups) with FCIs (Failover Cluster Instances), both of which depend on WSFC (Windows Server Failover Cluster).

Clicking 'always on' doesn't ensure you now have an AG configuration. You have to set async, sync, read only/failover replicas, set priority, and take other considerations such as does the app support this configuration. For example, your app might use cross database MSDTC transactions, which are not supported and can cause unrecoverable corruption that requires a backup restore.

Right now what you are experiencing is a FCI failover. This is normal. This stops the services on one node, and starts the services on the other node. This works on the INSTANCE level. An AG solution is setup per database and the services are running on both nodes. SQL uses the WSFC APIs to keep data in sync on the replicas, and the database fails over to that replica; note not the instance.

You might want to do a lot of testing on this before deploying to production.