The network switch: unsung hero of the large-scale data center

// php echo do_shortcode (‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’)?>

While we generally associate low power consumption with battery-operated devices such as smartphones, smartwatches, and laptops, there are several other less obvious applications where low power consumption has a significant impact on our lives. daily. One such example is all the “plumbing” and communications infrastructure, often referred to as high performance computing, managed by network switches inside a modern large-scale data center.

Tom Wong (Source: Cadence)

With the explosive growth of online business driven by working from home, many industry sectors are reporting tremendous growth in internet usage and e-commerce. We work, learn, play from home while embracing e-commerce and online delivery, telemedicine, virtual fitness, and a host of other virtual events and experiences. And everything seems to have migrated to the cloud.

In the early 2010s, nearly 40% of large companies surveyed said they expected to exceed their IT capacity within two years. Almost a decade later, virtually every business, regardless of size or industry, relies heavily on technology to evolve and streamline their operations. More than ever, access to massive amounts of data is critical to their success. To increase their ability to process all this data quickly, these companies need to secure more compute and storage capacity from cloud providers who are building massive data centers while accelerating the deployment of next-generation technology.

Cadence Network Switch Data Center
Data center architecture and all levels of the data center network. (Source: Cadence)

Hyper-scale technologies

When we think of a large scale data center, the first thing that usually comes to mind is the trusted server processor. Performance and power savings come from very predictable x86 scaling. We’ve also seen the migration of processing power to FPGAs, GPUs, and more recently, custom systems-on-a-chip (SoC) designed in-house by the Internet giants. With each subsequent technological development, processors have historically made improvements in the very predictable manner defined by Moore’s Law. The other essential components of a large-scale data center are wired and wireless connectivity, networking and storage. These also feature a natural progression of improvement through the latest Ethernet and networking standards, as well as the latest memory, broadband connectivity and storage technologies.

The rush to the cloud centers on the server processor, artificial intelligence, advanced memories, and multi-chip packaging. Often the performance limitation is not the performance of the processor or the type of advanced memory technology is adopted. On the contrary, the network and connectivity are the bottleneck. The speed at which data can travel between servers within a rack, between racks, between buildings, between campuses, and ultimately to the Internet are also key factors.

The unsung hero behind this critical infrastructure is the network switch. In the space of five years, we have seen the network switch host speed double every two years, from 3.2TB in 2015 to 12.8TB in 2019 to 25.6TB in 2020.

We’re not far from the 51.2TB rollout, especially with advancements in the development of high-speed SerDes that translate into long-range, single-lane 112G capabilities. This translates into a trend in module bandwidth from 100G in 2015 to 200/400 G in 2019. We are now on the cusp of a major speed rollout from 400G to 800G over the two to three months. next three years. This is accompanied by improvements in optical components beyond 28 to 56 Gbaud optics that began in 2019. All of these changes coincide with the transition from non-return-to-zero coding to higher-modulated PAM4 (modulation). pulse amplitude, 4 levels) much more efficient coding.

A quick study of what is available in the commercial market reveals that the majority of 12.8TB SoCs are manufactured at the 16nm process node. For 25.6TB, SoCs went to 7nm from the end of 2019, entering volume production in 2020. The first generation 25.6TB SoCs used 50G SerDes, the best technology available at the time. More recent announcements indicate that the 100G SerDes chips are finally here, and that a transition from 50G to 100G SerDes as well as a process technology migration from 7 to 5nm is expected.

The profits are quite significant. Consider a 25.6 Tbps switch: if it depends on a 50 G SerDes, the device will require 512 lanes. With a SerDes 100 G, the number of channels is reduced to 256. The reduction in matrix area and power consumption resulting from this dramatic reduction in the number of channels is significant. Each of these network switched ASIC chips consumes a lot of power, over 300W!

The next platter is 51.2TB. So how do you get there?

Cadence Network Switch - change host speed
Changing the speed of the switch host. Faster I / O enables higher density switching. (Source: Cadence)

New design paradigm

Manufacturing of 51.2 Tb switching ASICs is expected to start at 5nm, eventually migrating to 3nm. This is mainly influenced by the longer development cycles and the alignment with the deployment schedules of the advanced foundry processes. It also depends on both the availability and adoption of 112G SerDes versus 56G SerDes to improve “number of lanes versus matrix size versus power” considerations.

Another possibility is that the next generation network switch takes a disaggregated approach and instead uses multiple chips rather than large monolithic chips. This approach would help in two ways. The smaller the die, the higher their efficiency, especially when the die size is pushed to the limits of lithography / reticle. Improved efficiency translates into lower costs. The ability to reuse proven high-speed silicon-based SerDes in chip form will help speed time to market and improve the early deployment success of 51.2TB switching ASICs.

This change, however, will require rethinking the design methodology. The shift from a single-chip design to a multi-chip design requires more attention to the design constraints and limitations of chips, substrates, and packages. The high-speed nature of these complex SoCs will create additional design and verification burdens. At 100 G and above, this is no longer a SPICE simulation. Designers must consider the effects of inductance, noise, effects of transmission lines (terminations), crosstalk, and dielectric coefficients of various materials and parameters, as well as ensure application access to models of canals.

This results in more complicated thermal designs. It is no longer a question of managing the temperature inside a chip. It is also necessary to monitor temperature gradients across the die and the location of thermal hot spots. The temperature must now be considered in its entirety, from the die to the interposer, including the packaging substrate and the heat sink. Even the selection of die fixing materials and thermal grease for the heat sink are design considerations. At this level of design complexity, there is no trial and error.

High-speed network switching SoCs would not be possible without a number of technological innovations. In addition to the obvious high-speed I / O (SerDes), a fundamental set of hardware IP addresses are needed to be successful. Other enabling innovations include high performance processor cores, high density on-chip memory, high-speed interconnect (fabric) and memory bandwidth, and SoC integration.

SoC design platforms should also include IP cores such as 112G-LR PHY, 56G-LR PHY, Gen 2/3 high bandwidth memory PHY, and PCI Express 5.0 / 4.0 PHY. Additionally, a low-power matrix-to-matrix IP PHY is required to support multiple matrix integration, logic, and I / O disaggregation for multiple matrix implementation. To manage this necessary transition to the 25.6 Tb / s switch, and possibly to the 51.2 Tb / s switch, we need a new design methodology. These include AI-powered design tools, advanced packaging, and other aspects of chip design that have long been taken for granted.

Now is the time to step up a gear and start our engines of innovation.

–Tom Wong is Marketing Director for IP Design at Cadence

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *