The following are variations on the same theme and were published around 2000. The Sandia technical report is certainly the most complete, but a bit lengthy for a casual read. For a less detailed but still complete technical overview, try the CANPC paper. The Hot Interconnects and OpNetwork papers focus on individual aspects of the study.
Simulation studies of gigabit ethernet versus Myrinet using real application
cores
Helen Chen and Pete Wyckoff
Proceedings of CANPC '00 workshop at HPCA-6, Toulouse, France, January 2000
PDF
Presentation slides, PDF
Parallel cluster computing projects use a large number of commodity PCs to
provide cost-effective computational power to run parallel applications.
Because properly load-balanced distributed parallel applications tend to send
messages synchronously, minimizing blocking is as crucial a requirement for the
network fabric as are those of high bandwidth and low latency. We consider the
selection of an optimal, commodity-based, interconnect network technology and
topology to provide high bandwidth, low latency, and reliable delivery.
Since our network design goal is to facilitate the performance of real
applications, we evaluated the performance of myrinet and gigabit ethernet
technologies in the context of working algorithms using modeling and simulation
tools developed for this work.
Our simulation results show that myrinet behaves well in the absence of
congestion. Under heavy load, its latency suffers due to blocking in the
distributed wormhole routing scheme.
Conventional gigabit ethernet switches can not scale to support more than 64
gigabit ethernet ports today which leads to the use of cascaded switches.
Bandwidth limitation in the interswitch links and extra store-and-forward
delays limit aggregate performance of this configuration.
The Avici switch router uses six 40 Gbps internal links to connect individual
switching nodes in a wormhole-routed three-dimensional torus. Additionaly, the
fabric's large speed-up factor and its per-connection buffer management scheme
provides for non-blocking deliveries under heavy load.
Performance evaluation of a Gigabit ethernet switch and Myrinet using real
application cores
Helen Chen and Pete Wyckoff
Proceedings of Hot Interconnects '00, Stanford, CA, August 2000
PDF
Traditionally, high-end clusters use special purpose network hardware, thereby losing the cost benefit offered in mass market. Riding on the wave of Ethernet popularity, Gigabit Ethernet is fast becoming a commodity item. Given a scalable switching architecture, we believe it can be a cost-effective alternative to interconnect parallel systems. This study evaluates the performance of the Avici Gigabit Ethernet switch against Myrinet. We simulate a 256-node cluster running core algorithms from real parallel applications, and then compare raw performance figures such as bandwidth and latency, as well as more complex parameters such as jitter, routing, and points of congestion in the fabric.
Helen Chen, Pete Wyckoff and Katie Moor
Cost/performance evaluation of gigabit ethernet and Myrinet as cluster
interconnect
Proceedings of OpNetwork '00, Washington, DC, August 2000
PDF
The idea of cluster computing is to aggregate machine room full of relatively cheap hardware, connected with some sort of network, and apply the combined power of the individual machines on a single calculation. Because this architecture consists of distributed memory, parallel processes communicate using a message-passing paradigm. High-end cluster users typically rely on special purpose hardware such as Myrinet, HiPPI, or ServerNet for their message passing infrastructures, thereby losing the cost benefit offered by the commodity market. Riding on the wave of Ethernet popularity, Gigabit Ethernet [4] is fast becoming a commodity item. We evaluated its performance by simulating core algorithms from real parallel applications. We compared raw performance figures such as bandwidth and latency, as well as more complex parameters such as jitter, routing, and points of congestion in the fabric against similar studies conducted on Avici and Myrinet technology.
High performance commodity interconnects for clustered scientific and
engineering computing
Helen Chen and Pete Wyckoff
Sandia National Laboratories tech report SAND99-3149/3, December 1999
PDF
The Computational Plant (CPlant) project will run distributed parallel
applications on a large cluster of commodity PCs. Because properly
load-balanced distributed parallel applications tend to send messages
synchronously, minimizing blocking is as crucial a requirement as are high
bandwidth and low latency. Therefore, we consider the selection of an optimal,
commodity-based, interconnect network technology and topology to provide high
bandwidth, low latency, and reliable delivery to be an important design
consideration.
Since our network design goal is to facilitate
the performance of real applications, we evaluated the performance of myrinet and gigabit ethernet technologies in the context of working
algorithms using modeling and simulation tools developed in this project. In addition to latency and bandwidth, we evaluated performance
enhancements to parallel algorithms using hardware-based multicast and
cut-through routing.
Our simulation results show that myrinet behaves well in the absence of
congestion. Under heavy load, its latency suffers due to blocking in wormhole
routing. Also, because of its severe cable length constraint, it limits
myrinet's ability to scale. The simplicity in the myrinet switch results in low
per-connection cost; however, it is also the reason for its lack of
manageability and robustness in large systems.
Conventional gigabit ethernet switches can not scale to support more than 64
gigabit ethernet ports. Therefore, in order to build large parallel systems,
switches must be cascaded. Several limitations arise as a result of this
constraint. The ethernet spanning tree routing algorithm precludes a mesh
topology. Without diverse paths, interswitch links become bandwidth
bottlenecks. These switches store and then forward packets at each hop,
inducing additional end-to-end latency among cascaded switches. However,
conventional gigabit ethernet switches deliver the best multicast performance
because they implemented multicast in hardware.
The Avici terabit switch router (TSR) uses six 40 Gb/s internal links to
interconnect individual switching nodes in a 3D toroidal mesh. Its switch
fabric uses wormhole routing to provide cut-through latency to the gigabit
ethernet hosts connected via its network IO cards. Our simulation studies show
that it performed as well as myrinet when messages are smaller than 256 bytes,
and progressively better when message sizes increased beyond 512 bytes. It
also proved to be highly scaleable and robust. Riding on the ethernet popularit
y current, we expect the Avici solution to become cost efficient.