Multi-Cloud Enterprise Network Architectures — natively bridging AWS and Azure at scale (Part 2)

Multi-region scaling in a multi-cloud environment, with AWS Transit Gateway peerings, Azure vWAN default hub mesh and native VPNs

13 min readDec 8, 2020

In the first part of the multi-cloud network architecture series, we explored how a simple multi-cloud connectivity can be achieved using cloud-native solutions like AWS Transit Gateway and Azure vWAN.

For most of the enterprises, the single region deployment setup will not be sufficient, as they operate at scale and serve customers globally, requiring disaster recovery and data protection plans. When the goals move away from simple static content distribution and become more focused on enhanced customer experience, companies start considering expanding their architecture footprint to multiple regions and across multiple clouds. Main reasons motivating the adoption of a multi-region strategy include:

High availability — the ability to withstand regional system failures in one cloud, enabling part or all of a system to effectively transition load to an alternate region, most often the peer region of the other cloud.
Improved latency — the need to process and serve non-static data without incurring the overhead of long network paths and improving customer experience by decreasing the time to deliver content
Regional compliance — the differences in regional compliance requirements reflect in the data and services needing to be regionally hosted.

While the ‘M-squared’ (Multi-region Multi-cloud) scale delivers significant benefits, it also brings additional challenges and complexity on the technical approach, deployment, management, monitoring and cost control. All this has to be implemented in a way that it does not compromise the agility of the business.

In this post, I will go into the technical details on how to setup an ‘M2-ready’ (Multi-region Multi-cloud) network using AWS and Azure native cloud networking components and features — Transit Gateways, vWAN Hubs, peerings and VPNs. In our setup, we will be using six globally distributed cloud regions (three in AWS and three in Azure), with corresponding locations in the two cloud providers’ networks:

AMER — AWS North Virginia (us-east-1), Azure East US 2
EU — AWS Ireland (eu-west-1), Azure UK South
APAC — AWS Japan (ap-northeast-1), Azure Japan East

The technical details on setting up the basic environment components will not be covered — Transit Gateway and vWAN, VPCs and VNETs, Route Tables, etc., need to be deployed and configured in advance based on the cloud providers best practices or information shared in my previous post. We will go through the more advanced settings of Transit Gateway peerings, vWAN hubs mesh, VPNs and routing for the ‘M-squared’ setup.

AWS Networking at scale

Scaling multi-region connectivity in AWS is enhanced through the use of Transit Gateway peerings, a feature that enables customers to extend the centralized regional connectivity and build global networks spanning multiple AWS regions. Traffic using inter-region Transit Gateway peering stays on the AWS backbone and is encrypted, thereby reducing potential security threats and failure scenarios.

My test deployment in AWS makes use of the following configurations:

VPCs

VPC.us-east-1–10.101.1.0/24
VPC.eu-west-1–10.102.1.0/24
VPC.ap-northeast-1–10.103.1.0/24

VPC Route Tables

Since our focus is the private IP Routing, each of the private route tables contain the 10.0.0.0/8 route (part of the private IP Space as defined by RFC 1918), targeted to the Transit Gateway.

Transit Gateways

TGW.us-east-1 — ASN: 65123
TGW.eu-west-1 — ASN: 65124
TGW.ap-northeast-1 — ASN: 65125

All Transit Gateways have the the following settings:
- Default association route table: Disabled
- Default propagation route table: Disabled

Transit Gateway Attachments, Associations and Route Tables

The TGWs will be configured with two route tables: PROD and PEERING. Each VPC attachment will be associated with the PROD route table and it will be propagated to the PROD and PEERING route tables. All peering connections will be associated with the PEERING route table.

Transit Gateway Cross-region peerings

The new part of the setup is the cross-region peerings between the three Transit Gateways. First step is to create the peerings:

On TGW.us-east-1 — create the peerings to TGW.eu-west-1 and TGW.ap-northeast-1

On TGW.eu-west-1 — accept the peering from TGW.us-east-1 and create the peering to TGW.ap-northeast-1

On TGW.ap-northeast-1 — accept the peerings from TGW.us-east-1 and TGW.eu-west-1

Next step is to associate all the peering connections on all TGWs to the PEERING route table. I chose to separate the TGW peering attachments from the VPC attachments because on one hand, their purpose is different and on the other, I want to ensure the flexibility of the routes that I configure targeting the cross-region peerings.

TGW.us-east-1, TGW.eu-west-1 and TGW.ap-northeast 1 PEERING route tables

Last step is to configure the static routing across all the peering connections. The PROD and PEERING route tables on the TGWs must have the following routes:

TGW.us-east-1

PROD and PEERING route tables on TGW.us-east-1

TGW.eu-west-1

PROD and PEERING route tables on TGW.eu-west-1

TGW.ap-northeast-1

PROD and PEERING route tables on TGW.ap-northeast-1

AWS Transit Gateway Network Manager

Going to a multi-region deployment, the Transit Gateway Network Manager provides a single global view of the private network. You can create a new global network and register the Transit Gateways to it:

You can now check the AWS Intra-Cloud connectivity — make sure you have the right security groups configuration for this part.

Azure Networking at scale

The deployment in Azure makes use of the following configurations:

VNets

VNet.eastus2–10.201.1.0/24
VNet.uksouth–10.202.1.0/24
VNet.japaneast–10.203.1.0/24

vWAN Hubs

hub-eastus2 — Address space: 10.1.0.0/24
hub-uksouth— Address space: 10.3.0.0/24
hub-japaneast— Address space: 10.2.0.0/24

All vWAN Hubs have the following configurations:
- Site-to-site VPN Gateway: Enabled
- BGP autonomous system number: 65515
- Gateway scale units: 1 unit (500Mbps x 2)

vWAN Hubs VNet Connections and Routing

On each vWAN Hub we’ll use the PROD route table for the VNet connections and we’ll make sure to also propagate the VNet connections to the Default and PROD route tables of all Hubs.

AWS-Azure VPN

In Part 1, we explored the detailed process of setting up a VPN connection between the AWS Transit Gateway and the Azure vWAN Hub. Now, we’re going to set the same type of connectivity between the following region pairs:

AWS us-east-1 <> Azure East US 2
AWS eu-west-1 <> Azure UK South
AWS ap-northeast-1 <> Azure Japan East

AWS VPNs

The VPN attachments on the TGWs will be associated to the VPN Route Table and propagated to all VPC, PEERING and VPN route tables. Once you create the VPNs between the three site pairs following the steps described in Part 1, associate and propagate them as described, you can configure the VPN route table on each TGW with corresponding the static routes on the TGW peerings.

In this testing scenario, all three route tables — PROD, VPN, PEERING — end up having identical routes. I chose to separate attachments by resource type because this prepares the stage for multiple routing domains, filtering and better route manipulation. Also, a choice was made on what to advertise on the VPN Connections — in this setup, we’re advertising all prefixes from all AWS regions on each VPN Connection to Azure.

The breakdown structure of the route tables on each TGW looks like this:

TGW.us-east-1

VPN, PROD and PEERING route tables on the TGW.us-east-1

TGW.eu-west-1

VPN, PROD and PEERING route tables on the TGW.eu-west-1

TGW.ap-northeast-1

VPN, PROD and PEERING route tables on the TGW.ap-northeast-1

The route tables structure shows the two traffic patterns that represent our focus in this testing scenario — Intra-Cloud and Cross-Cloud. By default, for Intra-Cloud traffic, the peering connections are preferred, whereas for Cross-Cloud, the VPNs are the main paths.

Azure VPNs

The VPN connections on the vWAN Hubs are associated with the Default Route table, and propagated to the PROD route table, on each hub. Here, there’s no need for a PEERING route table, because the hubs are transparently meshed as part of the underlying infrastructure.

Note: The same routing architecture is present here, with both the Default and the PROD route tables having identical routes.

The breakdown structure of the route tables on each vWAN Hub is:

hub-eastus2

Default and PROD route tables on hub-eastus2

hub-uksouth

Default and PROD route tables on hub-uksouth

hub-japaneast

Default and PROD route tables on hub-japaneast

Again, the route tables structure shows the two traffic patterns that represent our focus in this testing scenario — Intra-Cloud and Cross-Cloud. For Intra-Cloud, the vWAN Hub default mesh is preferred, whereas for Cross-Cloud, the VPNs are the main paths.

AWS Network Manager and Azure vWAN Console

You can use AWS Network Manager and Azure vWAN console to gain increased visibility into the global network you created. Both services provide metrics and insights as well as the logical connectivity of the testing scenario.:

The second Tunnel of each VPN connection is down since we can only create two VPN Connections in Azure, each with a single Tunnel interface.
In Azure, the same topology is available in vWAN Insights Tab (Public Preview mode):

Routing considerations

On one side, the AWS Transit Gateway only supports static routing for the cross-region peerings. On the other, the Azure vWAN has default BGP enabled between the Hubs that are part of the same vWAN — routes learned from Hubs in other regions have a default AS Path of 65520–65520.

Given the slightly different routing capabilities of the AWS Transit Gateway and the Azure vWAN Hubs, the paths traffic takes between some regions is by default asymmetric. Let’s have a look at them in depth, and explore how to fix this asymmetry.

Intra-cloud traffic —as expected, this traffic remains on the backbone of the respective cloud provider

Cross-cloud traffic — this type of traffic creates the asymmetry challenges — let’s see where:

Traffic between two regions connected with a direct VPN Connection
- us-east-1 <> East US 2
- eu-west-1 <> UK South
- ap-northeast-1 <> Japan East
This traffic takes the direct VPN path, so there’s no concern.

Traffic between two regions not connected with a direct VPN Connection
- us-east-1 <> UK South or Japan East
- eu-west-1 <> East US 2 or Japan East
- ap-northeast-1 <> East US 2 or UK South

The entire Internet is built on the concept of asymmetry, so it’s very common. Despite that, there are production scenarios where traffic flow asymmetry can become an issue — especially when there is a need for keeping track of state — with Firewalls or NAT devices, so let’s explore one of the possible ways to way to fix it.

Since the Azure vWAN routes announcements are done via BGP and we don’t have many knobs to turn there, we can turn to the Transit Gateway configuration, where we can tweak the static routes on the cross-region peerings in the PROD and PEERING Route Tables. So, instead of routing towards the non-local Azure regions using the propagated prefixes from the VPN Connections, we can configure static routes for the traffic to be directed over the peering connections — e.g. Traffic from the AWS TGW.eu-west-1 destined to the Azure vWAN Hub in East US 2 would go over the peering to the TGW.us-east-1, and then exit the VPN connection to the vWAN Hub in US East 2, instead of directly taking the VPN to the vWAN Hub in UK South and then the Azure backbone to US East 2 Hub — the default path.

***TGW.eu-west-1 <> hub-eastus2*** *and* ***TGW.ap-northeast-1 <> hub-eastus2*** *after asymmetry fix*

The route tables on the AWS TGWs after configuring the static routes to fix the asymmetry are as follows:

TGW.us-east-1

TGW.eu-west-1

TGW.ap-northeast-1

Also, you need to take into account that if both VPN connections between two regions fail (e.g. AWS us-east-1 <> Azure East US 2), the static routes will remain in the TGW PROD and PEERING route tables. For seamless failover, you can automate the route insertion/deletion with AWS Lambda using event-driven workflows.

This is just one of the options you have for solving the asymmetry problem if you have it. Another one is to set up static routes on the vWAN Hubs PROD route tables, and draw the traffic more on the Azure Backbone, and make sure the automation workflows deal with both VPNs between an AWS and an Azure region fail. Last but not least, you can set up a full mesh of VPN connections between the AWS and Azure regions, and never worry about asymmetry again. No matter what path you choose, you must make sure you have an accurate understanding of how AWS Transit Gateways and Azure vWANs work routing-wise.

Connectivity Testing

Keeping the same instance types as in Part 1 — t2.micro and Standard D2s v3 — let’s first test the traffic patterns as highlighted above:

Intra-cloud AWS

VM.us-east-1 <> VM.eu-west-1[centos@VM.us-east-1 ~]$ ping VM.eu-west-1 -c 3
PING VM.eu-west-1 (10.102.1.10) 56(84) bytes of data.
64 bytes from VM.eu-west-1 (10.102.1.10): icmp_seq=1 ttl=61 time=68.8 ms
<Output Omitted>
--- VM.eu-west-1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 4ms
rtt min/avg/max/mdev = 67.993/68.298/68.823/0.372 msVM.us-east-1 <> VM.ap-northeast-1[centos@VM.us-east-1 ~]$ ping VM.ap-northeast-1 -c 3
PING VM.ap-northeast-1 (10.103.1.10) 56(84) bytes of data.
64 bytes from VM.ap-northeast-1 (10.103.1.10): icmp_seq=1 ttl=61 time=171 ms
<Output Omitted>
--- VM.ap-northeast-1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 5ms
rtt min/avg/max/mdev = 169.306/169.731/170.538/0.662 ms

Intra-cloud Azure

VM.eastus2 <> VM.uksouth[centos@VM.eastus2 ~]$ ping VM.uksouth -c 3
PING VM.uksouth (10.202.1.4) 56(84) bytes of data.
64 bytes from VM.uksouth (10.202.1.4): icmp_seq=1 ttl=62 time=84.5 ms
<Output Omitted>
--- VM.uksouth ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 6ms
rtt min/avg/max/mdev = 80.832/82.320/84.462/1.587 msVM.eastus2 <> VM.japaneast[centos@VM.eastus2 ~]$ ping VM.japaneast -c 3
PING VM.japaneast (10.203.1.4) 56(84) bytes of data.
64 bytes from VM.japaneast (10.203.1.4): icmp_seq=1 ttl=62 time=155 ms
<Output Omitted>
--- VM.japaneast ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 5ms
rtt min/avg/max/mdev = 152.093/152.967/154.510/1.139 ms

Cross-cloud between regions directly connected with VPNs

AWS us-east-1 > Azure East US 2 and Azure East US 2 > AWS us-east-1

Cross-cloud between regions that are not directly connected

AWS us-east-1 > Azure UK South and Azure UK South > AWS us-east-1

Note that there’s an internal AWS APIPA hop that is not transparent to ICMP for the AWS > Azure traffic direction.

Costs

We’ve explored in Part 1 the costs for a single-region deployment. When going to a multi-region setup:
- AWS Transit Gateway peerings are billed the same as any attachment — $0.05/hour/peering + $0.02/GB of data transferred OUT on the peering connection
- Azure vWAN HUBs are meshed by default and data transfer is billed on OUT and IN with $0.02/GB.

Conclusion

In this second part of our quest to natively bridging AWS and Azure at enterprise scale, we explored the opportunities and challenges of multi-cloud multi-region standards-based network architectures. The ease with which we manually set up a global network is compelling, and if we take into consideration all the automation tools that can be used for creating deployment workflows and workbooks, the time to deploy decreases to minutes.

In my opinion, the most important lesson from this exploration work is the fact that cloud networking offers a lot of possibilities with very fast deployment times— it’s only a matter of exploring those options and figuring out what best fits your needs.