Find White Papers
Home About Contact Help
Free Membership Member Login
Search the Library                  Advanced Search

HPC Management Software: Reducing the Complexity of HPC Cluster and Grid Resources

Platform Computing
By : Platform Computing
INFORMATION
Published : May 13, 2008
Length : 19
Type : Analyst Report
 
Download Now
Save for Later
  Email This Page
Overview :

This white paper reviews the rise of clusters to dominate the HPC market and points out the elevating effect it has had on the role of the software in use between the operating system and application layers. It clarifies these midstack boundaries and suggests what the HPC management software variety of middleware is and what it is not. It points out how the HPC cluster revolution has set this part of the software stack on a path to become an essential, global binding agent for today’s sprawling HPC hardware and software infrastructure.

The paper discusses the challenges associated with managing clusters and grids and the important role of standards in HPC management software design. IDC profiles Platform Computing as an example of a leading provider in this space and includes mini case studies of Platform Computing customers and a Platform Computing partner.

View All Items By This Company
Browse Related Categories :

Grid Computing

,

Middleware

,

Software Development

,

Utility Computing

 
The typical architecture of a high-performance computing (HPC) resource is no longer a single custom-engineered, integrated hardware and software system with collections of vendor-ported and tested applications. Today, HPC resources are typically constructed à la carte from standards-based hardware and software component technologies. These resources are likely to include multiple systems at multiple sites. The variety and number of those components, their distinct providers, their potentially different locations, and the costliness of testing user applications against them all have driven up HPC resource management complexity and created an integration  gap. Most HPC system configurations purchased today are unique assemblages of hardware and software. The growing task of integrating them into a reliable, productive, and secure working environment has fallen heavily on system administration and user support personnel.
The careful design, selection, and integration of the software components between the operating system (OS) and application layers of the software stack are increasingly vital for reducing the complexity of HPC cluster and grid management and for presenting an integrated and responsive HPC resource to users. Often informally referred to as middleware, the number of components in midstack HPC management software has grown significantly in the past 10 years. In IDC’s opinion, this process has exposed a growing opportunity for vendors to add value to the HPC work cycle by adopting, streamlining, and modularizing the now long list of software functions assumed by HPC management software, including:
- User application compilation, debugging, and profiling
- User and system application input/output translation and scripting
- User application libraries, mathematical, IO, and parallel (e.g., MPI, OpenMP)
- Workload queuing, scheduling, and management (e.g., migration, checkpoint, restart)
- System and application installation, integration, and patching
- System management - servers, software, security, disk storage, and backups
- System and application monitoring, reporting, provisioning, reconfiguration, and failure detection

 From its origins more than 25 years ago in pioneering programs such as transaction processing (TP) monitor software, which among other things ensures the integrity of transactions passing between clients and applications, software at work between user applications and the operating system has played an increasingly important and diverse role. From their intermediate position, these programs perform a wide variety of crucial linking, mediating, and control functions.
By virtue of this intermediate position, it has become customary to refer casually to all software components that are not clearly part of the application or the operating system as middleware. However, the common term middleware does little to call out the many distinct and important roles now played by the software components working between HPC applications and the OS. This section in part addresses that shortcoming by reviewing and grouping the components of HPC management software presented schematically in Figure 1.
One way to consider this question is to ask if HPC can be done with only the application and the OS - without any intermediate software. A serial application running directly on top of a single processor operating system can still qualify as HPC, and it is missing anything that we might call middleware. HPC application and system management software can then be added in increments to yield what is in common use today. First, a workload manager and job scheduler could be added that directs this serial application to one of several idle nodes in a cluster on one of perhaps several clusters in a grid. In the center of Figure 1, the workload manager is the key component in the set of HPC management tools responsible for scheduling and running jobs. Workload managers are required today to manage the interface between the HPC application and HPC’s new scaled-out, cluster- and grid-parallel infrastructure.
Next, a parallel IO and message passing library such as MPI could be integrated into the application to allow it to run in parallel. On the left in Figure 1, MPI (or any parallel programming model) is a key component of application middleware, which is part of the application development complex of HPC management software. This step adds intermediate software to allow individual applications to take advantage of today’s parallel HPC resources.
Search the Library                  Advanced Search
About Us Contact Us List Your Papers Partner With Us Site Map