OSC PBS

This page documents how we build PBS for use at OSC. We run PBS on four different clusters of machines, with different architectures and operating systems. Most of these machines use the Maui scheduler rather than any of the provided PBS schedulers. The process starts with a stock OpenPBS 2.3.12 source distribution, then applies about 36 different patches, then the usual configure, make, and make install steps.

Information related to the use of PBS at OSC can be found here, including many useful scripts for checking job status, accounting, and scheduling.

Configuration

This line is what we use to configure a PBS tree, after unpacking and applying the patches described below. It assumes you will be using gcc to compile the code.

CFLAGS='-g -Wall -Wno-unused -Wno-parentheses -DNO_SECURITY_CHECK' \ ./configure \ --prefix=/usr/local/pbs \ --set-server-home=/var/spool/pbs \ --set-sched=no \ --disable-shell-pipe \ --enable-shell-use-argv

An explanation of the flags follows:

To build, type "make". Parallel builds work fine too.

To install, be root, as some of the binaries are root-owned setuid, and do "make install". Then to install the man pages, do:

( cd doc ; make install )

The first time you install you will have to install the docs twice as some part of the make gets confused.

To build editor tags for your particular architecture, use:

( find src/{cmds,iff,include,lib,mom_rcp,resmom/linux,server} -name '*.[ch]' find src/resmom -maxdepth 1 -name '*.[ch]' ) | ctags -L-

where you can substitute some other architecture for linux above. The complexity of those lines is to avoid tagging unused files.

Finally, to put everything back to the way you found it, "make distclean".

Patches

Start with a fresh copy of the OpenPBS 2.3.12 distribution, unpack it, and give it a reasonable name:

tar xfz /home/pw/src/Tars/pbs-2.3.12.tgz mv OpenPBS_2_3_12 pbs-2.3.12 cd pbs-2.3.12

Then apply this whole slew of patches, in the order given below. Using a different order may work fine, but you may get "orig" files due to large offsets for some of the patches.

You can download the patches one at a time using the little curly symbol in front of its name, or you can get the entire collection in a single tarball. It comes with a handy Makefile to apply all the patches in order, too:

Source RPM

If you want to experiment with a source RPM format, the patches along with the original pbs tarball and a spec file to put it all together can be found here:


(download)mpiexec.patch - Fix TM interface to work nicely with mpiexec.

Add a few fixes to the TM interface and some functionality enhancements for the MPI parallel code launcher, mpiexec (http://www.osc.edu/~pw/mpiexec/). Copied from mpiexec/patch/pbs-2.3.12-mpiexec.diff.

(download)fault-tolerance.patch - Replace most socket operations with non-blocking versions.

Convert almost all blocking system calls to non-blocking to avoid hanging the server when a mom dies. This is based on the now-classic CPlant fault tolerance patch, but heavily modified from that original.

(download)prologue-environment.patch - Export Resource_List.nodes in an environment variable for prologue.

This adds an environment variable which is received by the prologue and epilogue scripts and can be used to modify the system based on the "-lnodes=" request made by the user.

(download)increase-timeouts-2.patch - Increase some communication timeouts.

This increases some communication timeouts to allow for busier and larger clusters and networks. This second version decreases the TCP timeout for communication with the scheduler as something seems to be broken with moab.

(download)meminfo-2.patch - Parse Linux /proc/meminfo correctly.

On linux systems, this fixes parsing of /proc/pid/meminfo to avoid overflow for values larger than a 32-bit integer. It also reads the total memory on the system from /proc/meminfo, rather than /proc/kcore, as the latter source is no longer accurate. Further it reads not the random three header lines used in 2.4 kernels, as those disappeared in 2.6.

(download)no-munge-server-name.patch - Do not install over existing server_name.

This fixes the install script not to rewrite the contents of /var/spool/pbs/server_name on every install. Handy if you make and install from a machine which will not be your PBS server, or if the server_name includes a port number or something more complex than just the short hostname of the installing machine.

(download)docfix.patch - Fix minor manpage typo.

Edit the manpage for pbsnodes to fix a typo.

(download)config-ia64.patch - Import newer config.guess and config.sub.

Include newer config.guess and config.sub files from gnu.org. They are still quite old (2001-02-24 compared to 1997, though), but are good enough to know about the ia64 architecture.

(download)doc-cleanup.patch - Delete generated doc files on make distclean.

Cause "make distclean" to remove generated files in the ers/ subdirectory.

(download)no-linux-headers.patch - Do not include linux-specific headers.

PBS mom was incorrectly including linux kernel headers, which no longer works on modern systems. This fixes those includes.

(download)tracejob-line-size.patch - Allow longer lines in tracejob output.

Increase a static buffer used by tracejob to avoid truncating long lists of nodes used by a parallel job.

(download)server-node-list-length.patch - Increase server accounting record length.

Similar to the above, increase some PBS server limits to avoid it truncating long node lists.

(download)compile-warning-fixes.patch - Fix warnings.

Grab bag of ANSI-fication, warning removal, and comment fixes.

(download)mom-file-descriptor-leak.patch - Do not leak file descriptors for jobs that failed to start.

Plug a file descriptor leak in the mom that occurs when jobs do not start correctly.

(download)unused-tcp-interrupt.patch - Remove unused misleading tcp_interrupt variable and usage.

Delete unused, and somewhat confusing, variable to break out of TCP poll loops.

(download)prologue-bounce.patch - Tell sisters of failure during prologue step.

When a job is requeued during the prologue step, be sure that the other moms involved in a multinode allocation find out. Otherwise they will report errors when the job is rerun on them later.

(download)quickshutsig.patch - Change default not to kill all jobs on server shutdown.

Change default shutdown behavior of PBS server to leave jobs along. Previously it would kill everything off by default. From the Ben collection.

(download)node-alloc-bug.patch - Fix server node allocation bug.

Fix bug in server node allocation code. From the Ben collection.

(download)qterm.patch - Remove dangerous default behavior of qterm.

Disable a default behavior for qterm. The default was especially dangerous, thus this at least makes one think about the action. From the Ben collection.

(download)qmgr-readline.patch - Enable readline support for qmgr.

This extra handy patch enables readline support for qmgr. The command-line editing features of that library are quite nice for those of us used to it in Unix shells. From the Ben collection, modified a bit.

(download)job-name-no-limit.patch - Allow long job names.

Remove arbitrary limit of 15 characters in the job name field. The claim is that this is required by a specification somewhere, but our users get annoyed at the short names it enforces.

(download)mom-mlockall.patch - Lock mom pages in memory.

Prevent the operating system from swapping out the pages of a mom process. Inspired by a patch from NCSA, but fixed to make sure that children spawned by the mom do not continue to have all their pages locked too. Also quite simplified.

(download)install-verbose.patch - Do not hide directory creation on make install.

Prevent "make install" from hiding the fact that it creates a directory.

(download)compile-warning-fixes-2.patch - Fix compile warnings #2.

Another grab bag of ANSI-fication, warning removal, and comment fixes.

(download)shell-use-argv.patch - Invoke user script directly rather than piping the contents.

Add a new method of job invocation to the already existing two choices. Now you may pick one of three:

(download)maui-silver-rm-extension.patch - Add a resource manager field needed by maui/silver.

Add a magic resource manager field which helps maui and its companion metascheduler, silver.

(download)compile-warning-fixes-3.patch - Fix compile warnings, #3.

The third grab bag of ANSI-fication, warning removal, and comment fixes.

(download)mom-restart.patch - Track running jobs properly across a mom restart.

For mpiexec-spawned jobs to survive across a mom restart, and to enable proper accounting for all jobs which continue across a mom restart, this patch fixes some behavior of mom when restarted with the "-p" flag. Note that this patch adds functionality to the machine-specific part of the mom code for linux only. Users of other system types could cut-n-paste that code without too much problem, but as it stands, this patch will break compilation on non-linux systems.

This patch does four things:

(download)rpm.patch - Fix build process for RPM.

Add some "$(ROOT)" prefixes to variables in the install scripts to allow PBS packages, such as RPM, to build anywhere.

(download)vmem-accounting.patch - Make vmem accounting more rational.

This patch overhauls the way that memory resource usage is managed:

(download)libpbs-path.patch - Work around maui in-compilation of path to pbs utilities.

This is a hack to avoid having a version-specific path end up in libpbs.a because when maui gets built, it will also get the same path. If it was version dependent, then we would have to have a new version of maui installed every time we upgraded PBS.

(download)server-fd-close.patch - Comment that a certain bug should not be fixed.

Fix a small bug with the code that closes all file descriptors when the PBS server starts. Although this fix would be obvious, it exposes some other bugs that depend on it being broken. Instead just comment out the code and add a warning comment.

(download)neednodes-dynamic.patch - Let scheduler change neednodes of job that failed to run.

Allow the scheduler to change the Resource_List.neednodes of jobs that tried but failed to run, otherwise they are stuck waiting for the same set of nodes to run again.

(download)errno-fix.patch - Use errno properly.

Do not declare errno extern, instead include the proper header file. Fixes broken compilation on modern glibc systems.

(download)gcc32-fix-2.patch - Fix dependencies for modern gcc.

Modify the dependency calculation scripts to work with gcc-3.2 and newer compilers. The makedepend command will generate lines like "attr_atomic.o: <built-in>" and "attr_atomic.o: <command", which mean nothing to the Makefile and just end up causing errors. We want to remove these line.

(download)more-server-sockets.patch - Listen and service lots more sockets.

Listen for and service lots more sockets in server code.

(download)interactive-with-script.patch - Run batch script even in interactive case.

If the user specifies a batch script to an interactive job, the old behavior was to parse the script for #PBS directives but otherwise ignore the contents. With this patch, the script is run inside a shell connected to the terminal, allowing the user to see the output and provide interactive input to the script. Patch by Michal Kouril of ECECS at University of Cincinnati.

(download)pam_ruserok.patch - Add PAM support to PBS server's authentication process.

Replace the call to ruserok() in the authentication procedure with a call to another function which has the same signature, but instead uses the PAM facilities to determine whether a user is allowed to submit a job.

(download)less-ping-nodes.patch - Ping the nodes less frequently.

The default 15 sec during "hot recovery" was much too much, and every 5 minutes in general is fine I think.


Some of the above patches come from the Ben collection which also includes other potentially useful ones.

Last updated 6 Aug 2004.