Resilience is defined as the ability to quickly recover from a setback or other ordeal – literally, the ability to bounce back. So, with computer networks, how to conceive of resilience in the environment?
This article discusses four factors to consider when examining network resiliency, as well as how organizations can build redundancy into their business. network infrastructure.
1. Everything fails
The first step in designing a resilient network is understanding the reality that everything fails: routers, switches, circuits, cables, small form factor plug-ins, and even interconnects. It is necessary perform regular network maintenance. This maintenance keeps systems at appropriate software levels, enables security patches to be applied, and even provides for hardware maintenance and replacement.
2. Opening hours
Second, network teams need to think about the operating hours of the environment. For example, an office network might not have users after hours or on weekends. This type of network may have strict reliability and availability requirements during normal hours, but it can be maintained after hours. Other environments, such as data centers or life and security systems – for example, 911 centers and hospitals – need to operate 24/7. Therefore, a proper design of these networks must take into account both failures and the ability to operate during maintenance.
3. Virtualization, cloud applications and SaaS
The next step is to think about the effect of virtualization, cloud, and SaaS application suites. While it may seem like cloud-based applications are out of IT’s control, nothing could be further from the truth. For example, AWS goes to great lengths to advise customers on the availability provided by applications. Applications provide Service Level Agreements to users based on where they are hosted, such as single Availability Zones, multiple Availability Zones, or operating in multiple regions. Also important is how businesses and their customers connect to cloud or SaaS providers.
4. Reliable remote connectivity
Finally, in the era of the COVID-19 pandemic, businesses need to think about reliability of their remote connectivity. Does connectivity run on primary or secondary VPN concentrators, or is it balanced across a group of systems, allowing the scale needed for maintenance?
Create redundancy at all layers
So how do teams go about creating a resilient network design? Ultimately, it’s important to understand that redundancy is just a tool for building resilience.
Dozens of books are packed with tips on techniques for creating a resilient network design – I recommend Computer network problems and solutions by Russ White and Ethan Banks. But the bottom line with resiliency is that businesses need to apply redundancy to all layers of their infrastructure. This means designing with modularity and maintaining a physical and logical separation between the functional elements.
While site availability and resiliency can be established with circuit and component redundancy, applications that require continuous availability must be architected to be distributed across multiple data centers and availability zones. This allows the application to function during AWS, VMware, or other maintenance at any given location.
The most important element of this paradigm is the concept of network automation. This is how teams can ensure that changes are not prone to human error. Script sets require careful consideration, and all changes require proper documentation and testing. Any given change requires a minimum set of scripts, which includes one script to apply the change and another to test and validate the change. Ultimately, teams need a plan to handle exceptions and have a backout script to bring the environment back to its baseline before modification.