SQL Server Migration Using a Distributed Availability Group

SQL Server migrations are a headache, ask anyone who’s been through the pain of moving a reasonably large server and I’m sure they will agree.

There are all sorts of methods out there for getting the data over to our new server.

We can simply backup the databases and restore them to their new home. The problem there is that if we’re talking about anything more than a very trivial database, there’s going to be some potentially significant downtime.

We can log ship, a similar approach to above, save that the databases will already be staged on the new kit and the downtime will be much less, as we only have to copy and restore the latest transaction log backup. Log shipping is nice and is often my preferred method but it can be fiddly if there are a large number of databases with the added stress of making sure that last log backup has been applied.

Another option that we have is to utilise availability groups and distributed availability groups.

A distributed availability group is a special type of availability group that spans two separate availability groups. The availability groups that participate in a distributed availability group don’t need to be in the same location. They can be physical, virtual, on-premises, in the public cloud, or anywhere that supports an availability group deployment. This includes cross-domain and even cross-platform – such as between an availability group hosted on Linux and one hosted on Windows. As long as two availability groups can communicate, you can configure a distributed availability group with them.
https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/distributed-availability-groups?view=sql-server-ver16

Because it doesn’t require a common cluster, a distributed availability group allows us to link servers in situations were a cluster isn’t possible. Servers could be in remote locations, members or different domains, different OS levels or even different operating systems (yes, we can link Windows and Linux based AGs).

The ability to link servers in this way gives us a very nice and easy way to replicate data between servers when thinking about a migration.

Let’s have a quick look at our situation. I’ve got two servers, one running SQL2019 and one on SQL2022. I’m looking to migrate from SQL2019 to SQL2022.

Both of those servers hosts an AG. I’m looking to migrate the database from SQL2019AG to SQL2022AG. The destination AG is currently empty.

Create the Distributed AG

We’re going to need to link those AGs and start that database replicating between the two. To do that we’re going to need to build a distributed availability group.

Against the source AG, SQL2019AG, we’ll need to run the following, filling in the appropriate AG names and URLs.

CREATE AVAILABILITY
GROUP <Distributed AG Name>
WITH (DISTRIBUTED) AVAILABILITY
GROUP ON <source AG name>
WITH (
		LISTENER_URL = '<source listener URL>:<source mirroring endpoint port>'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)
	,<destination AG name>
WITH (
		LISTENER_URL = '<destination listener URL>:<destination mirroring endpoint port>'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)

So, in my example that would be,

CREATE AVAILABILITY
GROUP DistAG
WITH (DISTRIBUTED) AVAILABILITY
GROUP ON 'SQL2019AG'
WITH (
		LISTENER_URL = 'tcp://SQL2019AG.sqlundercover.local:5022'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)
	,'SQL2022AG'
WITH (
		LISTENER_URL = 'tcp://SQL2022AG.sqlundercover.local:5022'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)

NOTE: Please note that the port specified in the listener URL is the port number that the mirroring endpoint is listening (usually 5022) on and not the SQL Server port. This isn’t always clear in some of the documentation.

Now if we take a look under the availability groups tab in SSMS for the source server, we should see our new distributed availability group.

The replicas are showing but it looks like there’s an issue connecting to the 2022 server.

We’ll need to get things setup on there now. Running the following against SQL2022 should do the trick…

ALTER AVAILABILITY GROUP <Distributed AG Name>
JOIN AVAILABILITY GROUP ON '<source AG name>'
WITH (
		LISTENER_URL = '<source listener URL>:<source mirroring endpoint port>'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)
	,'<destination AG name>'
WITH (
		LISTENER_URL = '<destination listener URL>:<destination mirroring endpoint port>'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)

or in my case that’s going to be

ALTER AVAILABILITY GROUP DistAG
JOIN AVAILABILITY GROUP ON 'SQL2019AG'
WITH (
		LISTENER_URL = 'tcp://SQL2019AG.sqlundercover.local:5022'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)
	,'SQL2022AG'
WITH (
		LISTENER_URL = 'tcp://SQL2022AG.sqlundercover.local:5022'
		,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
		,FAILOVER_MODE = MANUAL
		,SEEDING_MODE = AUTOMATIC
		)

Lets give that Availability Groups tab on the source a refresh and see how it’s looking…

Much better.

And lets take a look at things on the destination, 2022 server.

As we can see, the MigrationTest database now exists on the destination server and is part it’s availability group.

The situation now is that the two AGs are both part of the distributed AG with the databases being replicated from the source to the destination.

Ensure Destination is up to Date

To migrate to our new server, all we need to do is fail over the distributed AG. But before we can do that, we’re going to need to make sure that the destination is up to date and there’s going to be no loss of data.

When we created the distributed AG, we set the availability mode to asynchronous. To ensure that we’re not going to lose any data, we’ll need to change it to synchronous.

The reason for setting the availability mode to async to start with was because any latency between the two clusters can result in poor performance at the database. If you’re confident that latency isn’t going to be an issue, you could have set it to synchronous when creating the distributed AG.

Check the Primary Servers for Both Availability Groups are in Synchronous Commit

If either of the primary servers are in asynchronous commit mode, switch them to synchronous commit.

Switch the Distributed AG to Synchronous Commit

Run the following code against both the source and the destination to switch the availability mode to synchronous.

 ALTER AVAILABILITY GROUP <Distributed AG Name>MODIFY AVAILABILITY GROUP ON
 '<source AG>' WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT),
 '<destination AG>' WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);

In my example, the code would be…

 ALTER AVAILABILITY GROUP DistAG MODIFY AVAILABILITY GROUP ON
 'SQL2019AG' WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT),
 'SQL2022AG' WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);

Check Replication is Up To Date

With traditional AGs, we’ve got a nice dashboard which can give us an easy way to check that things are up to date. Unfortunately, we don’t get this for distributed AGs, have a look and you’ll notice that the dashboard option is greyed out.

It would be nice to have some sort of dashboard for distributed AGs, watch this space, I feel a fiddle around with PowerBI coming up soon…

No, we have to revert to a couple of DMVs to get the info that we need.

Run the following on both the source and destination instances.

SELECT AGs.name, db_name(database_id) DBName, synchronization_state_desc, last_hardened_lsn
FROM sys.dm_hadr_database_replica_states replicaStates
JOIN sys.availability_groups AGs ON AGs.group_id = replicaStates.group_id
WHERE (is_distributed = 1) OR (is_primary_replica = 1)

We want to see the synchronisation state showing as ‘SYNCHRONIZED’ and the last_hardened_lsn for each database on one server matching the value for its corresponding database on the opposite server.

*Result from my source server, SQL2019a*

*Result from the destination server, SQL2022*

The distributed AG is synchronised and the last hardened lsns match so I’m happy with that.

We’re ready to fail over.

Failover the Distributed AG

Failing over a distributed AG is slightly different to how we’d do it for a normal AG.

First thing that we’re going to need to do is mark the current primary as a secondary. This is something that’s unique to a distributed AG.

Run this on the source server.

ALTER AVAILABILITY GROUP DistAG SET (ROLE = SECONDARY);

Lets check out our distributed AG (can’t we just call them DAGs? It’d be much easier to type) in SSMS. As we can see, it’s now showing as secondary.

With things in this state, we’re ready to fail over the distributed AG.

This is the same as you’d do with any AG. Run the following against the destination server. Notice that we’re using FORCE_FAILOVER_ALLOW_DATA_LOSS? Not something that I’d usually recommend but with a distributed AG, we have no other choice. That’s why we made sure that everything was synchronised earlier.

ALTER AVAILABILITY GROUP DistAG FORCE_FAILOVER_ALLOW_DATA_LOSS;

The distributed AG should have failed over the database should be available on our new servers.

Tidy Up

Now that the migration is complete you can probably get rid of that distributed AG.

Run against both sides of the distributed AG and your migration is complete.

DROP AVAILABILITY GROUP DistAG

I really love how simple a distributed AG can make what is usually quite a stressful process.

Migrations can be prepared and staged ahead of time so that when it comes to cut over day, all you need to worry about it making sure that things are in sync and failing over.

With a bit of thought and planning, any system downtime can be kept to an absolute minimum.

10 thoughts on “SQL Server Migration Using a Distributed Availability Group”

Add yours

Pingback: SQL Server Migration via Distributed AG – Curated SQL
Tom says:

Feb 10, 2025 at 7:51 am
I never understood the bad name Distributed Availability Group has (except the lack of documentation). Using the above method I was able to cut migration time (and downtime) from 4 hours to 30 minutes. 30 minutes from then users are informed about the start of maintenance till the users are able to use application again. Two things I would add though:
- The disks letters and database file paths must be identical on both sides of DAG. I run into issues when migrating to a new server with different disks / paths
- During DAG setup you have Listener_for_DAG1 (old server) and Listener_for_DAG2 (new server). Applications connect to the original Listener_for_DAG1. To avoid changes on applications side, after failing over to DAG2, you have to switch both listeners. It includes dropping them on SQL server side, dropping Listeners from AD, creating Listeners in AD again with correct rights, modifying SPN’s, and creating Listeners on SQL server side. This step usually took the most time.
LikeLike
Reply
1. David Fowler says:
  
  Feb 14, 2025 at 4:04 pm
  
  I think most of the misunderstanding around distributed AGs comes from lack of decent documentation (not helped by a lot of what is out there being wrong), they’re a little bit of an unknown thing. They feel like another one of those half baked features in SQL, someone thought they were a good idea but they were never really polished or developed to their real potential.
  
  Yeah, if you’ve got to do the listener DNS jiggery pokery, that’s often the bit that does take the time. If I can, I prefer to just update connection strings, that’s not always practical though.
  
  LikeLike
  
  Reply
Gerald says:

Feb 10, 2025 at 6:58 pm

We’ve been running a few AGs for a number of years having built and upgraded from SQL Server 2016 on Server 2016 to SQL Server 2019 running on Server 2019 and am contemplating our next upgrade and evaluating SQL Server 2022 and SQL Server 2025 (when released).

We see a lot of potential in the rumored ability to mirror tables (or parts thereof) to Azure Data Lake Parquet files for warehousing stuff, so we may be pushed to 2025 in the timeframe that I was contemplating upgrading to 2022.

I can see a big advantage in jumping directly from SQL 2019 (currently on Server 2019) to SQL 2025 (on server 2025), but with the traditional roll-on new servers into the failover cluster, you can only jump 1 Operating System level at a time as Cluster compatibility is only 1 level backwards (unless Server 2025 has a new trick up its sleeve), making us go through the process 2x, building 2x as many servers, synchronizing all the non-AG SQL Stuff (linked server, Jobs, Operators, etc.) 2x, etc.

However, with 125+ databases and associated applications talking to the current AG listener, I’d rather not have any additional delay (beyond the 5-10 seconds our normal Synchronous AG takes to move from one host to another during our weekend maintenance window) required for the DNS change of the listener to being an alias of the 2025 AG.

I can see a problem when you move the primary of the DAG (first?) before you reassign the source listener as an alias of the target’s listener that you connected applications lose their connectivity and then of course as you rip away the source listener the chaos that might cause in that source AG.

My question would be whether you can first DNS move the source listener to an alias of the target listener (not the DAG), and THEN enter the command on the source AG to failover the DAG listener to the target after the TTL has expired?

I could see that existing and new thread connections to the Moving Listener Address would continue at the source until the DNS change occurred where new connections begin flowing directly to the target listener (bypassing the DAG and not replicating back at this point). Existing connections to the source would continue to process through the source AG first and then through the DAG to the target AG. These existing connections to the source AG primary could be terminated manually (not sure if I want to do anything more challenging to those servers) when the metrics on the primary server shows no activity, kind of like what they do during a failover.

At this point, with no update activity coming from the source AG through the DAG to the target AG, we could simply move to the last instruction to drop the DAG from both AG environments and move on as we can’t go back to the source AG in the DAG because there will be updates applied on the target that did not apply to the source because we bypassed the DAG failover. We’ve created a 1-way door, mind you, but that is how it works in other upgrade scenarios (can’t apply new version stuff in an old version).

I’m not seeing a problem here with this method. Am I missing something?

LikeLike

Reply
1. David Fowler says:
  
  Feb 14, 2025 at 4:28 pm
  
  So, if I’m understanding this correctly, you’re looking at moving the source listener DNS entry and pointing it at the destination listener before failover?
  
  It’s an interesting thought. Test it.
  
  The communication between AGs is a two way thing, transaction one way, ack returning in the other. But moving that DNS entry you’re going to be breaking the connection between destination and source and more interestingly looping that connection back round into itself. How will the DAG cope with that, I’m not sure. Loss of connection isn’t a big deal, that is what they’re designed to cope with, the loop back may confuse it. If transactions will continue to flow between primary and secondary in that state I’m not sure, SQL may just drop the connection between the two.
  
  As for application connections, new connections coming in will be sent to the destination server but until failover, that database won’t be in readable state so any queries will fail. That may not be a big issue, if you can take that until failing the DAG over.
  
  As I said, it’s an interesting idea but test it first. I’m not 100% sure how things are going to behave with that DNS move.
  
  LikeLike
  
  Reply
  1. Gerald says:
    
    Feb 14, 2025 at 7:46 pm
    
    David,
    
    Thanks for the thought-provoking response. I’m not really sure how the DAG (or for that matter the AG itself) may respond to such a change. I’ve also sent the link to this discussion (as well as talked about the situation) with our most technically savvy Sysadmin (runs our Forest, etc.) and he actually recommends that I don’t try the double jump with this method but instead do an OS upgrade in place (followed by a cluster level step-up) and then do a more traditional (and MS supported) method of rolling on new servers into the existing AG at 1 level up in OS and 2 levels up in SQL Server.
    
    Given that I recently experienced the changing the Hypervisor underlying all of our VMs (located in 2 data centers 1 mile apart with independent connections to the internet supporting an Azure Quorum with true independent access for each server) and did not experience any significant lag as I brought the secondary servers back online (1 hour down) as the low volume at the time of migration didn’t overwhelm the performance of SQL Server (re-sync took seconds), I’m feeling fairly confident that this suggestion by the sysadmin is a lower risk solution to the double-upgrade challenge.
    
    That being said, I think I still need to figure out the issue of disassociating the listener name in DNS that connections to the AG use from the actual listener as configured on the server as this technique (DAG) would be useful in a migration to Azure that may be coming in the next couple of years, as our CIO appears to be of the idea that he wants to be out of the datacenter business (a 1-way door I understand for having worked for a well-known outsourcer of mainframes back in the 1980s and 90s, usually to the long-term financial detriment of the institution walking away from this internal skill).
    
    And we are stuck (at least for a few years) with running Linked Servers (something I inherited when I moved into the SQL Server stack from the Mainframe stack 15 years ago – UGH!) to Oracle to Synchronize data between applications. I’d be happier with a managed instance solution (that doesn’t support Oracle Linked Servers as you cannot install the necessary Oracle Client on a Managed instance), but if push comes to shove, I may need to solve before moving to Azure.
    
    The reason I say this is because being able to keep your clients (hundreds) talking to the same DNS name regardless of where it lives is highly valuable when someone decides to move the SQL Servers first and follow on with the remaining applications and their servers later because being able to tell the applications “here I am” whether the Listener is here on site, on Servers in Azure, or (after we get rid of Oracle Linked Servers) a managed instance without uprooting their lives has great value.
    
    This plays into the DAC technique allowing us to more quickly back out of a poor decision to migrate the database server first (simply executed within the DAC), because of the extra latency involved both from applications, but more specifically from those dreaded Oracle servers doing frequent data synchronization through these linked Servers that often run many minutes of every hour (insert joke here about hourly jobs lasting all hour, daily lasting all day, etc.)
    
    We will experience one version of this later this year (until perhaps it goes sideways) of moving the bulk (not all – another long story) of our Oracle to the Oracle cloud (as someone would rather pay Oracle for services than VMware because Oracle doesn’t perform on Hyper-V), adding in that extra slowness of not moving data within a high-performance network on-site injecting more router hops, etc. in the mix. Inter-cloud may actually perform better (if we are willing to pay) when we move SQL to Azure. I wish everyone here luck here.
    
    If I can somehow get my Listener IP Name to be an alias of my Listener before building the DAC, I can then assign it as an alias of the DAC before the cutover (while its primary is still on-site) and at the time of the dicey cutover to Azure hosting, have a fairly easy chicken-exit to move the primary AG in the DAC back on-prem in seconds without any worry about DNS issues doing so.
    
    Getting back to the issue of making the Listener IP Name an alias of the Listener IP number (allowing it to be reassigned as you move AGs later on), I think there is actually a simple solution that shouldn’t break anything. In a single unit of work, both change the IP Name of the IP number assigned as the listener to a meaningful new name and add the original Listener IP Name as an alias of that new Listener Name. I don’t even think you need to shorten the TTL to implement this as new existing leases will talk to the same IP Number and when renewed, will still talk to the same IP number.
    
    The only potential for issues is whether MS allows you to use a DNS alias instead of a real IP Name either while installing or operating an AG/Cluster (remember, the Listener is actually the cluster Listener as well). I cannot see a reason why the cluster or SQL Server will be any the wiser, but I will bang this thought off of my Sysadmin friend (and possibly MS, explaining the whole context), so I don’t waste any further time in this process than necessary.
    
    What have I learned from this? Aliases are a great thing that allow for quick corrections to poor planning (even the MS engineer who was on-site for my first AG Build 9 years ago didn’t think of this) or lack of seeing the future (I don’t think DACs were invented or at least well understood back when I built this thing in 2016).
    
    I appreciate your thoughts as a sounding board and look forward to hearing more of your thoughts on this matter.
    
    LikeLike
Ed says:

Feb 19, 2025 at 7:39 am

Hi David,

A really good article with lots of explanation on how to migrate using a dAG. One thing you did not mention is that it is possible to have more than one dAG defined on a single AG.

In my old job we used a dAG to link SQL clusters across two locations. Using a dAG Simplified BCO testing, as the dAG could be dropped and the secondary site run independently for the duration of the test. I know there will be some that will be horrified at the thought of no BCO capability during a BCO test but the management were happy with the situation.

When it got to the time to upgrade versions, we built new SQL clusters with the new SQL version (everything Hyper-V guests), and added a second dAG to link the old and new clusters. This allowed us to get all the data to the new servers with minimal impact to the live systems. We also used the BCO testing trick of dropping the migration dAG to test how the systems really worked on the new SQL version.

Come migration time, we threw off all the users (whoopieee) and did a fail over on the migration dAG. After final tests to ensure all was OK we changed the main DNS alias for user connections to point to the new active cluster and allowed everyone back on the system. This was followed by another day of synchronising the new clusters across the

LikeLike

Reply
Ed says:

Feb 19, 2025 at 8:01 am

Oh dear the end of my previous comment was truncated…

One thing to clarify, we pointed all user connections to a DNS alias which in turn was pointed to the dAG that linked our primary and secondary clusters. We xalked this the failover alias. The dAG knew which cluster was active to sent the SQL requests to the right cluster.

This was done to get stability in user connection strings. It meant we could change everything at the SQL instance level, and so long as the failover DNS alias could route things to the live server there would be no impact on user connections.

Therefore when doing a migration to a new SQL version we built new Windows clusters with new SQL clusters running the bew SQL version, all with new names. As already said, we used a dedicated migration dAG to link the old and new SQL clusters.

At migration time we did a failover of the migration dAG to the new cluster, then changed our failover DNS alias to point to it. This meant we were live on the new cluster and SQL version with zero connection string impact. After establishing the cross-site dAG and getting everything synchronised, the failover alias was changed to the cross-site dAG name. We now had our usual BCO capability running on the new SQL version, again with zero connection string impact.

For sites with multiple primary/secondary pairs, each pair would have its own dAG with its own DNS failover alias. This is what we did for our BI server set.

Finally, after all of this we were ready with a working model to deal with the next SQL upgrade when it got released by Microsoft.

LikeLike

Reply
Aarju Raj says:

Feb 22, 2025 at 5:12 pm

Great explanation

LikeLike

Reply
Pingback: Distributed Availability Group Dashboard – SQL Undercover

SQL Server Migration Using a Distributed Availability Group

Create the Distributed AG

Ensure Destination is up to Date

Check the Primary Servers for Both Availability Groups are in Synchronous Commit

Switch the Distributed AG to Synchronous Commit

Check Replication is Up To Date

Failover the Distributed AG

Tidy Up

10 thoughts on “SQL Server Migration Using a Distributed Availability Group”

Add yours

Leave a comment Cancel reply

David Fowler

Adrian Buckman

Archives

Follow me on Twitter

Recent Posts From The Undercover DBAs

Create the Distributed AG

Ensure Destination is up to Date

Check the Primary Servers for Both Availability Groups are in Synchronous Commit

Switch the Distributed AG to Synchronous Commit

Check Replication is Up To Date

Failover the Distributed AG

Tidy Up

Share this:

10 thoughts on “SQL Server Migration Using a Distributed Availability Group”

Add yours

Leave a comment Cancel reply

David Fowler

Adrian Buckman

Archives

Follow me on Twitter

Recent Posts From The Undercover DBAs