Forum Discussion

MasonHarris's avatar
MasonHarris
Equinix Product Manager
2 years ago

Building Highly Resilient Networks in Network Edge - Part 1 - Architecture

This is first part of a series of posts highlighting the best practices for customers who desire highly resilient networks in Network Edge. This entry will focus on the foundational architecture of Plane A and Plane B in Network Edge and how these building blocks should be utilized for resiliency.

For more information on Network Edge and device plane resiliency please refer to the Equinix Docs page here.

Dual Plane Architecture

Network Edge is built upon a standard data center redundancy architecture with multiple pods that have dedicated power supplies and a dedicated Top of Rack (ToR) switch. 

  • It consists of multiple compute planes commonly referred to as Plane A and Plane B for design simplicity
  • Plane A connects to the Primary Fabric network and Plane B connects to the Secondary Fabric network

The most important concept for understanding Network Edge resiliency: the device plane determines which Fabric switch is used for device connections. Future posts will dive much deeper in the various ways that Network Edge devices connect to other devices, Clouds and co-location.

  • While referred to as Plane A and Plane B, in reality there are multiple compute planes in each NE pod
  • The actual number varies based on the size of the metro
  • This allows devices to be deployed in a manner where they are not co-mingled on the same compute plane, eliminating compute as a single point of failure

  • Network Edge offers multiple resiliency options that can be summarized as device options and connection options
  • Device options are local and provide resiliency against failures at the compute and device level in the local metro
  • This is analogous to what the industry refers to as ”High Availability (HA)”
  • Connection resiliency is a separate option for customers that require additional resiliency with device connections (DLGs, VCs and EVP-LAN networks). This will be discussed in depth in separate sections.
  • It is common to combine both local resiliency and connection resiliency, but it is not required—ultimately it depends on the customer’s requirements
  • Geo-redundancy is an architecture that expands on local resiliency by utilizing multiple metros to eliminate issues that may affect NE at the metro level (not covered in this presentation)

Single Devices

  • Single devices have no resiliency for compute and device failures
  • The first single device is always connected to the Primary Compute Plane *
  • Single devices always make connections over the Primary Fabric network
  • Single devices can convert to Redundant Devices via the anti-affinity feature

Anti-Affinity Deployment Option

  • By default single devices have no resiliency 
  • However, single devices can be placed in divergent compute planes. This is commonly called anti-affinity and is part of the device creation workflow
  • Checking the "Select Diverse From" box allows customers to add new devices that are resilient to each other
  • Customers can verify this by viewing their NE Admin Dashboard and sorting their devices by "Plane" 

  • This feature allows customer to convert a single device install to Redundant Devices
  • By default, the first single device was deployed on the Primary Fabric
  • The actual compute plane is irrelevant until the 2nd device is provisioned
  • The 2nd device is deployed on the Secondary Fabric AND in a different compute plane than the first device

 

Resilient Device Options

  • These options provide local (intra-metro) resiliency to protect against hardware or power failures in the local Network Edge pod
  • By default, the two virtual devices are deployed in separate compute planes (A and B) 
  • In reality there are more than two compute planes, but they are distinct from each other

 

  • The primary device is connected to the Primary Fabric network and the secondary/passive device is connected to the Secondary Fabric network
 

 

 

 

Redundant Devices

Clustered Devices

Deployment

Two devices, both Active, appearing as two devices in the Network Edge portal. Both devices have all interfaces forwarding

Two devices, only one is ever Active. The Passive (non-Active) device data plane is not forwarding

WAN Management

Both devices get a unique L3 address that is active for WAN management

Each node gets a unique L3 address for WAN management as well as a Cluster address that connects to the active node (either 0 or 1)

Device Linking Groups

None are created at device inception

Two are created by default to share config synchronization and failover communication

Fabric Virtual Connections

Connections can be built to one or both devices

Single connections are built to a special VNI that connects to the Active Cluster node only. Customer can create optional, additional secondary connection(s)

Supports Geo-Redundancy ?

Yes, Redundant devices can be deployed in different metros

No, Clustered devices can only be deployed in the same metro

Vendor Support

All Vendors

Fortinet, Juniper, NGINX and Palo Alto

 

The next post will cover the best practices for creating resilient device connections with Device Link Groups and can be found here.