The world of super computing has changed in recent years, moving from a scale-up, monolithic, expensive architecture to the scale-out clustering of low cost microprocessors, also referred to as High Performance Business Computing (HPBC) clusters.
W H I T E P A P E R
HHiigghhllyy IInntteeggrraatteedd
HHiigghh PPeerrffoorrmmaannccee BBuussiinneessss CCoommppuuttiinngg ((HHPPBBCC))
Architecting a Simpler Solution
Cliff OberholtzerSilverStorm TechnologiesDirector, Product MarketingJune 2005HHiigghhllyy IInntteeggrraatteedd HHiigghh PPeerrffoorrmmaannccee BBuussiinneessss CCoommppuuttiinngg ((HHPPBBCC))AArrcchhiitteeccttiinngg aa SSiimmpplleerr SSoolluuttiioonn
INTRODUCTIONThe world of super computing has changed in recent years, moving from a scale-up, monolithic, expensive architecture to the scale-outclustering of low cost microprocessors, also referred to as High Performance Business Computing (HPBC) clusters. The semi-annualranking of the most powerful production supercomputers, the TOP500.org list, demonstrates that the scale-out model is chosen morefrequently today than the scale-up model. The TOP500 list is a descending ranking based on Giga (billion) or Terra (trillion) FLOPS(floating-point operations per second) generated by the HPL Benchmark test. These systems are powerful clusters built to solvecomplex computational problems for commercial, government, and academic users, bringing tangible results quickly.
Solving the more challenging computing problems facing organizations today requires far more powerful, yet cost-effective, systems.This translates to significantly larger clusters. Instead of clusters in the low hundreds of nodes, expect clusters in the multiplehundreds, thousands, even tens of thousands of nodes. Scaling systems to these levels can be incredibly difficult and complex, as it isoften quite painful to make the cluster work correctly and consistently as it scales. This is a non-trivial problem; the supercomputerhighway is littered with the carcasses of fine-grain (large scale) clustering failures.
Yet overcoming the challenges of designing highly scalable, high performance clusters is in fact achievable. It starts with the clusterarchitecture. Prior to any useful computation, proper planning is required, as each area of the cluster architecture has its own set ofissues to overcome.
This white paper will demonstrate how SilverStorm Technologies helps to overcome these architectural issues, while reducing oreliminating complexity with a simple, repeatable recipe.
ARCHITECTING THE CLUSTERTo design a cluster, you must be conscious of these high-level issues.
What kind of traffic must I plan to support on my cluster?. Do I employ parallel networks for each type of traffic, or share a network?. How do I manage the complexity?What are my cluster network performance requirements? . How do I meet them?. Does my network support linear CPU efficiency growth?. How easily can I scale my network?What are my storage requirements?. Do I use a parallel file system or discrete file and block storage?. What storage network interconnect is best for my deployment?How do I manage the cluster?. What application is used?. What are my visualization network requirements?How do I put this all together?. What tools are available to aid in the cluster construction?. How can I easily scale the cluster to meet future needs?
Page 2HHiigghhllyy IInntteeggrraatteedd HHiigghh PPeerrffoorrmmaannccee BBuussiinneessss CCoommppuuttiinngg ((HHPPBBCC))AArrcchhiitteeccttiinngg aa SSiimmpplleerr SSoolluuttiioonn
CLUSTER TRAFFICTypical clusters employ three types of network stacks that perform all of the functions that the cluster requires:TCP/IP for managing the cluster, initiating jobs, and visualizing the results.Message Passing Interface (MPI) for high performance message passing between the multitudes of CPUs, enabling them to act asone.Data storage for providing data into the cluster and from the cluster into a repository. This data can be either file or block-based,connected via Fibre Channel or Ethernet.
Figure 1: Cluster Traffic Types
CLUSTER NETWORKChoosing the right cluster interconnect to meet or exceed your requirements is key. As seen in the above figure, there are multipletraffic types to consider. Clusters can be interconnected with a variety of network types and protocols, each having their own benefitsand performance characteristics. Gigabit Ethernet-based clusters can use the TCP/IP path for HPBC message passing, which requiresCPU intervention for network communications and thus more total processor cycles. Myrinet uses an offload engine... [download for more]