Comparing Gigabit Ethernet against Myrinet for cluster computing

The following are variations on the same theme and were published around 2000. The Sandia technical report is certainly the most complete, but a bit lengthy for a casual read. For a less detailed but still complete technical overview, try the CANPC paper. The Hot Interconnects and OpNetwork papers focus on individual aspects of the study.

Simulation studies of gigabit ethernet versus Myrinet using real application cores
Helen Chen and Pete Wyckoff
Proceedings of CANPC '00 workshop at HPCA-6, Toulouse, France, January 2000
PDF
Presentation slides, PDF

Parallel cluster computing projects use a large number of commodity PCs to provide cost-effective computational power to run parallel applications. Because properly load-balanced distributed parallel applications tend to send messages synchronously, minimizing blocking is as crucial a requirement for the network fabric as are those of high bandwidth and low latency. We consider the selection of an optimal, commodity-based, interconnect network technology and topology to provide high bandwidth, low latency, and reliable delivery.

Since our network design goal is to facilitate the performance of real applications, we evaluated the performance of myrinet and gigabit ethernet technologies in the context of working algorithms using modeling and simulation tools developed for this work.

Our simulation results show that myrinet behaves well in the absence of congestion. Under heavy load, its latency suffers due to blocking in the distributed wormhole routing scheme.

Conventional gigabit ethernet switches can not scale to support more than 64 gigabit ethernet ports today which leads to the use of cascaded switches. Bandwidth limitation in the interswitch links and extra store-and-forward delays limit aggregate performance of this configuration.

The Avici switch router uses six 40 Gbps internal links to connect individual switching nodes in a wormhole-routed three-dimensional torus. Additionaly, the fabric's large speed-up factor and its per-connection buffer management scheme provides for non-blocking deliveries under heavy load.

Performance evaluation of a Gigabit ethernet switch and Myrinet using real application cores
Helen Chen and Pete Wyckoff
Proceedings of Hot Interconnects '00, Stanford, CA, August 2000
PDF

Traditionally, high-end clusters use special purpose network hardware, thereby losing the cost benefit offered in mass market. Riding on the wave of Ethernet popularity, Gigabit Ethernet is fast becoming a commodity item. Given a scalable switching architecture, we believe it can be a cost-effective alternative to interconnect parallel systems. This study evaluates the performance of the Avici Gigabit Ethernet switch against Myrinet. We simulate a 256-node cluster running core algorithms from real parallel applications, and then compare raw performance figures such as bandwidth and latency, as well as more complex parameters such as jitter, routing, and points of congestion in the fabric.

Helen Chen, Pete Wyckoff and Katie Moor
Cost/performance evaluation of gigabit ethernet and Myrinet as cluster interconnect
Proceedings of OpNetwork '00, Washington, DC, August 2000
PDF

The idea of cluster computing is to aggregate machine room full of relatively cheap hardware, connected with some sort of network, and apply the combined power of the individual machines on a single calculation. Because this architecture consists of distributed memory, parallel processes communicate using a message-passing paradigm. High-end cluster users typically rely on special purpose hardware such as Myrinet, HiPPI, or ServerNet for their message passing infrastructures, thereby losing the cost benefit offered by the commodity market. Riding on the wave of Ethernet popularity, Gigabit Ethernet [4] is fast becoming a commodity item. We evaluated its performance by simulating core algorithms from real parallel applications. We compared raw performance figures such as bandwidth and latency, as well as more complex parameters such as jitter, routing, and points of congestion in the fabric against similar studies conducted on Avici and Myrinet technology.

High performance commodity interconnects for clustered scientific and engineering computing
Helen Chen and Pete Wyckoff
Sandia National Laboratories tech report SAND99-3149/3, December 1999
PDF

The Computational Plant (CPlant) project will run distributed parallel applications on a large cluster of commodity PCs. Because properly load-balanced distributed parallel applications tend to send messages synchronously, minimizing blocking is as crucial a requirement as are high bandwidth and low latency. Therefore, we consider the selection of an optimal, commodity-based, interconnect network technology and topology to provide high bandwidth, low latency, and reliable delivery to be an important design consideration.

Since our network design goal is to facilitate the performance of real applications, we evaluated the performance of myrinet and gigabit ethernet technologies in the context of working algorithms using modeling and simulation tools developed in this project. In addition to latency and bandwidth, we evaluated performance enhancements to parallel algorithms using hardware-based multicast and cut-through routing.

Our simulation results show that myrinet behaves well in the absence of congestion. Under heavy load, its latency suffers due to blocking in wormhole routing. Also, because of its severe cable length constraint, it limits myrinet's ability to scale. The simplicity in the myrinet switch results in low per-connection cost; however, it is also the reason for its lack of manageability and robustness in large systems.

Conventional gigabit ethernet switches can not scale to support more than 64 gigabit ethernet ports. Therefore, in order to build large parallel systems, switches must be cascaded. Several limitations arise as a result of this constraint. The ethernet spanning tree routing algorithm precludes a mesh topology. Without diverse paths, interswitch links become bandwidth bottlenecks. These switches store and then forward packets at each hop, inducing additional end-to-end latency among cascaded switches. However, conventional gigabit ethernet switches deliver the best multicast performance because they implemented multicast in hardware.

The Avici terabit switch router (TSR) uses six 40 Gb/s internal links to interconnect individual switching nodes in a 3D toroidal mesh. Its switch fabric uses wormhole routing to provide cut-through latency to the gigabit ethernet hosts connected via its network IO cards. Our simulation studies show that it performed as well as myrinet when messages are smaller than 256 bytes, and progressively better when message sizes increased beyond 512 bytes. It also proved to be highly scaleable and robust. Riding on the ethernet popularit y current, we expect the Avici solution to become cost efficient.

Back to index

Last updated 18 Mar 2005.