High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI



•	Table of Contents
•	Index
•	Reviews
•	Reader Reviews
•	Errata
•	Academic

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI
By Joseph D. Sloan

Publisher	: O'Reilly
Pub Date	: November 2004
ISBN	: 0-596-00570-9
Pages	: 360

Preface

Part I: An Introduction to Clusters

Chapter 1. Cluster Architecture

Section 1.1. Modern Computing and the Role of Clusters

Section 1.2. Types of Clusters

Section 1.3. Distributed Computing and Clusters

Section 1.4. Limitations

Section 1.5. My Biases

Chapter 2. Cluster Planning

Section 2.1. Design Steps

Section 2.2. Determining Your Cluster's Mission

Section 2.3. Architecture and Cluster Software

Section 2.4. Cluster Kits

Section 2.5. CD-ROM-Based Clusters

Section 2.6. Benchmarks

Chapter 3. Cluster Hardware

Section 3.1. Design Decisions

Section 3.2. Environment

Chapter 4. Linux for Clusters

Section 4.1. Installing Linux

Section 4.2. Configuring Services

Section 4.3. Cluster Security

Part II: Getting Started Quickly

Chapter 5. openMosix

Section 5.1. What Is openMosix?

Section 5.2. How openMosix Works

Section 5.3. Selecting an Installation Approach

Section 5.4. Installing a Precompiled Kernel

Section 5.5. Using openMosix

Section 5.6. Recompiling the Kernel

Section 5.7. Is openMosix Right for You?

Chapter 6. OSCAR

Section 6.1. Why OSCAR?

Section 6.2. What's in OSCAR

Section 6.3. Installing OSCAR

Section 6.4. Security and OSCAR

Section 6.5. Using switcher

Section 6.6. Using LAM/MPI with OSCAR

Chapter 7. Rocks

Section 7.1. Installing Rocks

Section 7.2. Managing Rocks

Section 7.3. Using MPICH with Rocks

Part III: Building Custom Clusters

Chapter 8. Cloning Systems

Section 8.1. Configuring Systems

Section 8.2. Automating Installations

Section 8.3. Notes for OSCAR and Rocks Users

Chapter 9. Programming Software

Section 9.1. Programming Languages

Section 9.2. Selecting a Library

Section 9.3. LAM/MPI

Section 9.4. MPICH

Section 9.5. Other Programming Software

Section 9.6. Notes for OSCAR Users

Section 9.7. Notes for Rocks Users

Chapter 10. Management Software

Section 10.1. C3

Section 10.2. Ganglia

Section 10.3. Notes for OSCAR and Rocks Users

Chapter 11. Scheduling Software

Section 11.1. OpenPBS

Section 11.2. Notes for OSCAR and Rocks Users

Chapter 12. Parallel Filesystems

Section 12.1. PVFS

Section 12.2. Using PVFS

Section 12.3. Notes for OSCAR and Rocks Users

Part IV: Cluster Programming

Chapter 13. Getting Started with MPI

Section 13.1. MPI

Section 13.2. A Simple Problem

Section 13.3. An MPI Solution

Section 13.4. I/O with MPI

Section 13.5. Broadcast Communications

Chapter 14. Additional MPI Features

Section 14.1. More on Point-to-Point Communication

Section 14.2. More on Collective Communication

Section 14.3. Managing Communicators

Section 14.4. Packaging Data

Chapter 15. Designing Parallel Programs

Section 15.1. Overview

Section 15.2. Problem Decomposition

Section 15.3. Mapping Tasks to Processors

Section 15.4. Other Considerations

Chapter 16. Debugging Parallel Programs

Section 16.1. Debugging and Parallel Programs

Section 16.2. Avoiding Problems

Section 16.3. Programming Tools

Section 16.4. Rereading Code

Section 16.5. Tracing with printf

Section 16.6. Symbolic Debuggers

Section 16.7. Using gdb and ddd with MPI

Section 16.8. Notes for OSCAR and Rocks Users

Chapter 17. Profiling Parallel Programs

Section 17.1. Why Profile?

Section 17.2. Writing and Optimizing Code

Section 17.3. Timing Complete Programs

Section 17.4. Timing C Code Segments

Section 17.5. Profilers

Section 17.6. MPE

Section 17.7. Customized MPE Logging

Section 17.8. Notes for OSCAR and Rocks Users

Part V: Appendix

Appendix A. References

This new guide covers everything you need to plan, build, and deploy a high-performance Linux cluster. You'll learn about planning, hardware choices, bulk installation of Linux on multiple systems, and other basic considerations. Learn about the major free software projects and how to choose those that are most helpful to new cluster administrators and programmers. Guidelines for debugging, profiling, performance tuning, and managing jobs from multiple users round out this immensely useful book.

Printed in the United States of America.

Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.

Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc. The Linux series designations, High Performance Linux Clusters with OSCAR, Rocks, openMosix, and MPI, images of the American West, and related trade dress are trademarks of O'Reilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

Preface

Clusters built from open source software, particularly based on the GNU/Linux operating system, are increasingly popular. Their success is not hard to explain because they can cheaply solve an ever-widening range of number-crunching applications. A wealth of open source or free software has emerged to make it easy to set up, administer, and program these clusters. Each individual package is accompanied by documentation, sometimes very rich and thorough. But knowing where to start and how to get the different pieces working proves daunting for many programmers and administrators.

This book is an overview of the issues that new cluster administrators have to deal with in making clusters meet their needs, ranging from the initial hardware and software choices through long-term considerations such as performance.

This book is not a substitute for the documentation that accompanies the software that it describes. You should download and read the documentation for the software. Most of the documentation available online is quite good; some is truly excellent.

In writing this book, I have evaluated a large number of programs and selected for inclusion the software I believe is the most useful for someone new to clustering. While writing descriptions of that software, I culled through thousands of pages of documentation to fashion a manageable introduction. This book brings together the information you'll need to get started. After reading it, you should have a clear idea of what is possible, what is available, and where to go to get it. While this book doesn't stand alone, it should reduce the amount of work you'll need to do. I have tried to write the sort of book I would have wanted when I got started with clusters.

The software described in this book is freely available, open source software. All of the software is available for use with Linux; however, much of it should work nicely on other platforms as well. All of the software has been installed and tested as described in this book. However, the behavior or suitability of the software described in this book cannot be guaranteed. While the material in this book is presented in good faith, neither the author nor O'Reilly Media, Inc. makes any explicit or implied warranty as to the behavior or suitability of this software. We strongly urge you to evaluate the software and information provided in this book as appropriate for your own circumstances.

One of the more important developments in the short life of high performance clusters has been the creation of cluster installation kits such as OSCAR and Rocks. With software packages like these, it is possible to install everything you need and very quickly have a fully functional cluster. For this reason, OSCAR and Rocks play a central role in this book.

OSCAR and Rocks are composed of a number of different independent packages, as well as customizations available only with each kit. A fully functional cluster will have a number of software packages each addressing a different need, such as programming, management, and scheduling. OSCAR and Rocks use a best-in-category approach, selecting the best available software for each type of cluster-related task. In addition to the core software, other compatible packages are available as well. Consequently, you will often have several products to choose from for any given need.

Most of the software included in OSCAR or Rocks is significant in its own right. Such software is often nontrivial to install and takes time to learn to use to its full potential. While both OSCAR and Rocks automate the installation process, there is still a lot to learn to effectively use either kit. Installing OSCAR or Rocks is only the beginning.

After some basic background information, this book describes the installation of OSCAR and then Rocks. The remainder of the book describes in greater detail much of the software found in these packages. In each case, I describe the installation, configuration, and use of the software apart from OSCAR or Rocks. This should provide the reader with the information he will need to customize the software or even build a custom cluster bypassing OSCAR or Rocks completely, if desired.

I have also included a chapter on openMosix in this book, which may seem an odd choice to some. But there are several compelling reasons for including this information. First, not everyone needs a world-class high-performance cluster. If you have several machines and would like to use them together, but don't want the headaches that can come with a full cluster, openMosix is worth investigating. Second, openMosix is a nice addition to some more traditional clusters. Including openMosix also provides an opportunity to review recompiling the Linux kernel and an alternative kernel that can be used to demonstrate OSCAR's kernel_picker. Finally, I think openMosix is a really nice piece of software. In a sense, it represents the future, or at least one possible future, for clusters.

I have described in detail (too much, some might say) exactly how I have installed the software. Unquestionably, by the time you read, this some of the information will be dated. I have decided not to follow the practice of many authors in such situations, and offer just vague generalities. I feel that readers benefit from seeing the specific sorts of problems that appear in specific installations and how to think about their solutions.

Audience

This book is an introduction to building high-performance clusters. It is written for the biologist, chemist, or physicist who has just acquired two dozen recycled computers and is wondering how she might combine them to perform that calculation that has always taken too long to complete on her desktop machine. It is written for the computer science student who needs help getting started building his first cluster. It is not meant to be an exhaustive treatment of clusters, but rather attempts to introduce the basics needed to build and begin using a cluster.

In writing this book, I have assumed that the reader is familiar with the basics of setting up and administering a Linux system. At a number of places in this book, I provide a very quick overview of some of the issues. These sections are meant as a review, not an exhaustive introduction. If you need help in this area, several excellent books are available and are listed in the Appendix of this book.

When introducing a topic as extensive as clusters, it is impossible to discuss every relevant topic in detail without losing focus and producing an unmanageable book. Thus, I have had to make a number of hard decisions about what to include. There are many topics that, while of no interest to most readers, are nonetheless important to some. When faced with such topics, I have tried to briefly describe alternatives and provide pointers to additional material. For example, while computational grids are outside the scope of this book, I have tried to provide pointers for those of you who wish to know more about grids.

For the chapters dealing with programming, I have assumed a basic knowledge of C. For high-performance computing, FORTRAN and C are still the most common choices. For Linux-based systems, C seemed a more reasonable choice.

I have limited the programming examples to MPI since I believe this is the most appropriate parallel library for beginners. I have made a particular effort to keep the programming examples as simple as possible. There are a number of excellent books on MPI programming. Unfortunately, the available books on MPI all tend to use fairly complex problems as examples. Consequently, it is all too easy to get lost in the details of an example and miss the point. While you may become annoyed with my simplistic examples, I hope that you won't miss the point. You can always turn to these other books for more complex, real-world examples.

With any introductory book, there are things that must be omitted to keep the book manageable. This problem is further compounded by the time constraints of publication. I did not include a chapter on diskless systems because I believe the complexities introduced by using diskless systems are best avoided by people new to clusters. Because covering computational grids would have considerably lengthened this book, they are not included. There simply wasn't time or space to cover some very worthwhile software, most notably PVM and Condor. These were hard decisions.

Organization

This book is composed of 17 chapters, divided into four parts. The first part addresses background material; the second part deals with getting a cluster running quickly; the third part goes into more depth describing how a custom cluster can be built; and the fourth part introduces cluster programming.

Depending on your background and goals, different parts of this book are likely to be of interest. I have tried to provide information here and at the beginning of each section that should help you in selecting those parts of greatest interest. You should not need to read the entire book for it to be useful.

Part I, An Introduction to Clusters

Chapter 1, is a general introduction to high-performance computing from the perspective of clusters. It introduces basic terminology and provides a description of various high-performance technologies. It gives a broad overview of the different cluster architectures and discusses some of the inherent limitations of clusters.

Chapter 2, begins with a discussion of how to determine what you want your cluster to do. It then gives a quick overview of the different types of software you may need in your cluster.

Chapter 3, is a discussion of the hardware that goes into a cluster, including both the individual computers and network equipment.

Chapter 4, begins with a brief discussion of Linux in general. The bulk of the chapter covers the basics of installing and configuring Linux. This chapter assumes you are comfortable using Linux but may need a quick review of some administrative tasks.

Part II, Getting Started Quickly

Chapter 5, describes the installation, configuration, and use of openMosix. It also reviews how to recompile a Linux kernel.

Chapter 6, describes installing and setting up OSCAR. It also covers a few of the basics of using OSCAR.

Chapter 7, describes installing Rocks. It also covers a few of the basics of using Rocks.

Part III, Building Custom Clusters

Chapter 8, describes tools you can use to replicate the software installed on one machine onto others. Thus, once you have decided how to install and configure the software on an individual node in your cluster, this chapter will show you how to duplicate that installation on a number of machines quickly and efficiently.

Chapter 9, first describes programming software that you may want to consider. Next, it describes the installation and configuration of the software, along with additional utilities you'll need if you plan to write the application programs that will run on your cluster.

Chapter 10, describes tools you can use to manage your cluster. Once you have a working cluster, you face numerous administrative tasks, not the least of which is insuring that the machines in your cluster are running properly and configured identically. The tools in this chapter can make life much easier.

Chapter 11, describes OpenPBS, open source scheduling software. For heavily loaded clusters, you'll need software to allocate resources, schedule jobs, and enforce priorities. OpenPBS is one solution.

Chapter 12, describes setting up and configuring the Parallel Virtual File System (PVFS) software, a high-performance parallel file system for clusters.

Part IV, Cluster Programming

Chapter 13, is a tutorial on how to use the MPI library. It covers the basics. There is a lot more to MPI than what is described in this book, but that's a topic for another book or two. The material in this chapter will get you started.

Chapter 14, describes some of the more advanced features of MPI. The intent is not to make you proficient with any of these features but simply to let you know that they exist and how they might be useful.

Chapter 15, describes some techniques to break a program into pieces that can be run in parallel. There is no silver bullet for parallel programming, but there are several helpful ways to get started. The chapter is a quick overview.

Chapter 16, first reviews the techniques used to debug serial programs and then shows how the more traditional approaches can be extended and used to debug parallel programs. It also discusses a few problems that are unique to parallel programs.

Chapter 17, looks at techniques and tools that can be used to profile parallel programs. If you want to improve the performance of a parallel program, the first step is to find out where the program is spending its time. This chapter shows you how to get started.

Part V, Appendix

The Appendix includes source information and documentation for the software discussed in the book. It also includes pointers to other useful information about clusters.

Conventions

This book uses the following typographical conventions:

Italics: Used for program names, filenames, system names, email addresses, and URLs, and for emphasizing new terms.
Constant width: Used in examples showing programs, output from programs, the contents of files, or literal information.
Constant-width italics: Used for general syntax and items that should be replaced in expressions.

Indicates a tip, suggestion, or general note.

Indicates a warning or caution.

How to Contact Us

In a sense, any book is a work in progress. If you have comments, suggestions, or corrections, I would appreciate hearing from you. You can contact me through booktech@oreilly.com.

We have tested and verified the information in this book to the best of our ability, but you may find that features have changed (or even that we have made mistakes!). Please let us know about any errors you find, as well as your suggestions for future editions, by writing to:

O'Reilly & Media, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95472

1-800-998-9938 (in the U.S. or Canada)

1-707-829-0515 (international or local)

1-707-829-0104 (fax)

You can send us messages electronically. To be put on the mailing list or to request a catalog, send email to:

info@oreilly.com

To ask technical questions or to comment on the book, send email to:

bookquestions@oreilly.com

We have a web site for the book, where we'll list examples, errata, and any plans for future editions. You can access this page at:

http://www.oreilly.com/catalog/highperlinuxc/

For more information about this book and others, see the O'Reilly web site:

http://www.oreilly.com

Using Code Examples

The code developed in this book is available for download for free from the O'Reilly web site for this book http://www.oreilly.com/catalog/highperlinuxc. (Before installing, take a look at readme.txt in the download).

This book is here to help you get your job done. In general, you can use the code in this book in your programs and documentation. You don't need to contact us for permission unless you're reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book doesn't require permission. Selling or distributing a CD-ROM of examples from O'Reilly books does require permission. Answering a question by citing this book and quoting example code doesn't require permission. Incorporating a significant amount of example code from this book into your product's documentation does require permission.

We appreciate, but don't require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: "High Performance Linux Clusers with OSCAR, Rocks, openMosix, and MPI, by Joseph Sloan. Copyright 2005 O'Reilly, 0-596-00570-9."

If you feel your use of code examples falls outside fair use or the permission given here, feel free to contact us at permissions@oreilly.com.

Acknowledgments

While the cover of this book displays only my name, it is the work of a number of people. First and foremost, credit goes to the people who created the software described in this book. The quality of this software is truly remarkable. Anyone building a cluster owes a considerable debt to these developers.

This book would not exist if not for the students I have worked with both at Lander University and Wofford College. Brian Bell's interest first led me to investigate clusters. Michael Baker, Jonathan DeBusk, Ricaye Harris, Tilisha Haywood, Robert Merting, and Robert Veasey all suffered through courses using clusters. I can only hope they learned as much from me as I learned from them.

Thanks also goes to the computer science department and to the staff of information technology at Wofford College梚n particular, to Angela Shiflet for finding the funds and to Dave Whisnant for finding the computers used to build the clusters used in writing this book. Martin Aigner, Joe Burnet, Watts Hudgens, Jim Sawyers, and Scott Sperka, among others, provided support beyond the call of duty. Wofford is a great place to work and to write a book. Thanks to President Bernie Dunlap, Dean Dan Maultsby, and the faculty and staff for making Wofford one of the top liberal arts colleges in the nation.

I was very fortunate to have a number of technical reviewers for this book, including people intimately involved with the creation of the software described here, as well as general reviewers. Thanks goes to Kris Buytaert, a senior consultant with X-Tend and author of the openMosix HOWTO, for reviewing the chapter on openMosix. Kris's close involvement with the openMosix project helped provide a perspective not only on openMosix as it is today, but also on the future of the openMosix project.

Thomas Naughton and Stephen L. Scott, both from Oak Ridge National Laboratory and members of the OSCAR work group, reviewed the book. They provided not only many useful corrections, but helpful insight into cluster software as well, particularly OSCAR.

Edmund J. Sutcliffe, a consultant with Thoughtful Solutions, attempted to balance my sometimes myopic approach to clusters, arguing for a much broader perspective on clusters. Several topics were added or discussed in greater detail at his insistence. Had time allowed, more would have been added.

John McKowen Taylor, Jr., of Cadence Design System, Inc., also reviewed the book. In addition to correcting many errors, he provided many kind words and encouragement that I greatly appreciated.

Robert Bruce Thompson, author of two excellent books on PC hardware, corrected a number of leaks in the hardware chapter. Unfortunately, developers for Rocks declined an invitation to review the material, citing the pressures of putting together a new release.

While the reviewers unfailingly pointed out my numerous errors and misconceptions, it didn't follow that I understood everything they said or faithfully amended this manuscript. The blame for any errors that remain rests squarely on my shoulders.

I consider myself fortunate to be able to work with the people in the O'Reilly organization. This is the second book I have written with them and both have gone remarkably smoothly. If you are thinking of writing a technical book, I strongly urge you to consider O'Reilly. Unlike some other publishers, you will be working with technically astute people from the beginning. Particular thanks goes to Andy Oram, the technical editor for this book. Andy was constantly looking for ways to improve this book. Producing any book requires an small army of people, most of whom are hidden in the background and never receive proper recognition. A debt of gratitude is owed to many others working at O'Reilly.

This book would not have been possible without the support and patience of my family. Thank you.

http://book.opensourceproject.org.cn/enterprise/cluster/highplinux/

Chapter 1. Cluster Architecture

Computing speed isn't just a convenience. Faster computers allow us to solve larger problems, and to find solutions more quickly, with greater accuracy, and at a lower cost. All this adds up to a competitive advantage. In the sciences, this may mean the difference between being the first to publish and not publishing. In industry, it may determine who's first to the patent office.

Traditional high-performance clusters have proved their worth in a variety of uses梖rom predicting the weather to industrial design, from molecular dynamics to astronomical modeling. High-performance computing (HPC) has created a new approach to science梞odeling is now a viable and respected alternative to the more traditional experiential and theoretical approaches.

Clusters are also playing a greater role in business. High performance is a key issue in data mining or in image rendering. Advances in clustering technology have led to high-availability and load-balancing clusters. Clustering is now used for mission-critical applications such as web and FTP servers. For example, Google uses an ever-growing cluster composed of tens of thousands of computers.

Chapter 2. Cluster Planning

This chapter is an overview of cluster planning. It begins by introducing four key steps in developing a design for a cluster. Next, it presents several questions you can ask to help you determine what you want and need in a cluster. Finally, it briefly describes some of the software decisions you'll make and how these decisions impact the overall architecture of the cluster. In addition to helping people new to clustering plan the critical foundations of their cluster, the chapter serves as an overview of the software described in the book and its uses.

Chapter 3. Cluster Hardware

It is tempting to let the hardware dictate the architecture of your cluster. However, unless you are just playing around, you should let the potential uses of the cluster dictate its architecture. This in turn will determine, in large part, the hardware you use. At least, that is how it works in ideal, parallel universes.

In practice, there are often reasons why a less ideal approach might be necessary. Ultimately, most of them boil down to budgetary constraints. First-time clusters are often created from recycled equipment. After all, being able to use existing equipment is often the initial rationale for creating a cluster. Perhaps your cluster will need to serve more than one purpose. Maybe you are just exploring the possibilities. In some cases, such as learning about clusters, selecting the hardware first won't matter too much.

If you are building a cluster using existing, cast-off computers and have a very limited budget, then your hardware selection has already been made for you. But even if this is the case, you will still need to make a number of decisions on how to use your hardware. On the other hand, if you are fortunate enough to have a realistic budget to buy new equipment or just some money to augment existing equipment, you should begin by carefully considering your goals. The aim of this chapter is to guide you through the basic hardware decisions and to remind you of issues you might overlook. For more detailed information on PC hardware, you might consult PC Hardware in a Nutshell (O'Reilly).