|
The world of super computing has changed in recent years, moving from a scale-up, monolithic, expensive architecture to the scale-out clustering of low cost microprocessors, also referred to as High Performance Business Computing (HPBC) clusters. The semi-annual ranking of the most powerful production supercomputers, the TOP500.org list, demonstrates that the scale-out model is chosen more frequently today than the scale-up model. The TOP500 list is a descending ranking based on Giga (billion) or Terra (trillion) FLOPS (floating-point operations per second) generated by the HPL Benchmark test. These systems are powerful clusters built to solve complex computational problems for commercial, government, and academic users, bringing tangible results quickly.
Solving the more challenging computing problems facing organizations today requires far more powerful, yet cost-effective, systems. This translates to significantly larger clusters. Instead of clusters in the low hundreds of nodes, expect clusters in the multiple hundreds, thousands, even tens of thousands of nodes. Scaling systems to these levels can be incredibly difficult and complex, as it is often quite painful to make the cluster work correctly and consistently as it scales. This is a non-trivial problem; the supercomputer highway is littered with the carcasses of fine-grain (large scale) clustering failures.
Yet overcoming the challenges of designing highly scalable, high performance clusters is in fact achievable. It starts with the cluster architecture. Prior to any useful computation, proper planning is required, as each area of the cluster architecture has its own set of issues to overcome.
This white paper will demonstrate how SilverStorm Technologies helps to overcome these architectural issues, while reducing or eliminating complexity with a simple, repeatable recipe.
ARCHITECTING THE CLUSTER
To design a cluster, you must be conscious of these high-level issues.
What kind of traffic must I plan to support on my cluster?
- Do I employ parallel networks for each type of traffic, or share a network?
- How do I manage the complexity?
What are my cluster network performance requirements?
- How do I meet them?
- Does my network support linear CPU efficiency growth?
- How easily can I scale my network?
What are my storage requirements?
- Do I use a parallel file system or discrete file and block storage?
- What storage network interconnect is best for my deployment?
How do I manage the cluster?
- What application is used?
- What are my visualization network requirements?
How do I put this all together?
- What tools are available to aid in the cluster construction?
- How can I easily scale the cluster to meet future needs?
Highly Integrated High Performance Business Computing (HPBC)
Architecting a Simpler Solution
CLUSTER TRAFFIC
Typical clusters employ three types of network stacks that perform all of the functions that the cluster requires:
TCP/IP for managing the cluster, initiating jobs, and visualizing the results.
Message Passing Interface (MPI) for high performance message passing between the multitudes of CPUs, enabling them to act as one.
Data storage for providing data into the cluster and from the cluster into a repository. This data can be either file or block-based, connected via Fibre Channel or Ethernet.
CLUSTER NETWORK
Choosing the right cluster interconnect to meet or exceed your requirements is key. As seen in the above figure, there are multiple traffic types to consider. Clusters can be interconnected with a variety of network types and protocols, each having their own benefits and performance characteristics. Gigabit Ethernet-based clusters can use the TCP/IP path for HPBC message passing, which requires CPU intervention for network communications and thus more total processor cycles. Myrinet uses an offload engine to relieve the CPU of the communications task, and has twice the bandwidth as Gigabit Ethernet. Both are prevalent in today's TOP500 list; however InfiniBand, a standards-based, open high-performance interconnect, is proving that it too has a prominent place in small to large scale HPBC.
CLUSTER PERFORMANCE REQUIREMENTS
The CPU and the design of its memory subsystem is the initial indicator of cluster performance. The cluster network is the key to unleashing CPU power as the cluster scales. The object of the network is to get data from point A to point B as quickly as possible in the massively parallel cluster applications used in HPBC clusters.
|