News
Description
Download
FAQ
Mailing list
Changes
Notes about PBS
Notes about MPICH
Too many mpiexecs
Cute mpiexec hacks
Detailed ChangeLog

Mpiexec

News

Coming soon: release 0.84. Three new features are almost ready to go: support for mvapich2 version 1.0, InfiniPath MPI support, Totalview for MPICH2. Please test these from the code in the SVN and report any problems.

Release 0.83. It has been 15 months since the last release. This overdue release has just a few compilation and bug fixes, and a bit of new support. More detail is found below in the Changes section.

Add support for Portals, as implemented in userspace on TCP.

Add suport for mvapich 0.9.9 and 1.0 beta. Each of these last two MPICH on IB releases changed the startup protocol.

Force configure-time selection of a default communication device. Now you must configure using "--with-default-comm=mpich2" or similar.

Release 0.82. A few feature additions and many bug fixes, all of which are explained in more detail below in the Changes section.

Support for Intel MPI version 3 extensions to PMI.

Track individual TM node ids, enabling future NUMA-aware task placement.

New command-line switch '-npernode', generalization of '-pernode'.

The -transform-hostname feature now works on mpich2/pmi.

Release 0.81. Many changes, which are documented in more detail below in the Changes section.

Asynchronous GM and IB startup for much improved scalability.

Support new mvapich startup prtocol, but recent mvapich are still broken.

Support Myrinet's new MX message passing protocol. The

Support for MPI_Spawn and other MPI2 process management through PMI interface.

Redirection of standard IO streams for PBSpro systems through helper code.


Description

Mpiexec is a replacement program for the script mpirun, which is part of the mpich package. It is used to initialize a parallel job from within a PBS batch or interactive environment. Mpiexec uses the task manager library of PBS to spawn copies of the executable on the nodes in a PBS allocation.

Reasons to use mpiexec rather than a script (mpirun) or an external daemon (mpd):

Mpiexec handles creation of the node list file, if required by the message passing library, and the shared-memory file for use on SMP nodes. It also redirects standard input and output to the shell from which it was invoked, bypassing the PBS output and error files if you choose. One handy feature in particular is -allstdin which replicates the contents of an input stream to each process. Support for heterogeneous executables and/or different command lines to each task is provided through the use of an optional configuration file.

Mpiexec works on machines of many architectures, including x86, ia64, alpha, sparc, power4, and with many operating systems, including linux, freebsd, solaris, and darwin. It should be easily portable to any other machine which runs PBS.

Most current MPI implementations are supported:

These two related MPI implementations have their own startup programs that work with PBS. While there is vestigial support for LAM, you are encouraged to try the launcher in the distribution of the MPI library first.

Mpiexec is free software and is licensed for use under the GNU General Public License, version 2.


Download

You probably want to download the latest release, but read on if you need to support older versions of your MPI or communications libraries.

Tarballs and patches

If you prefer to upgrade by patch, the following will apply nicely from within your current mpiexec directory using patch -p1 -sNE or similar. Older tarballs are frequently available as tags in the SVN repository.

Release Patch
mpiexec-0.83.tgz 203 kB
mpiexec-0.82.tgz 197 kB mpiexec-0.82-0.83.diff.gz 57 kB
mpiexec-0.81.tgz 191 kB mpiexec-0.81-0.83.diff.gz 79 kB

SVN repository

An anonymous read-only subversion repository is maintained for mpiexec, too. No guarantees on the quality of the code you'll find in there at any given moment, but it usually tends to work.

You can browse the source code here, and to check out your own copy, this subversion command will create a new directory called mpiexec and populate it with the latest source tree:

svn co http://svn.osc.edu/repos/mpiexec/trunk mpiexec

FAQ

Please take a look at the Frequently Asked Questions (FAQ) to look for answers to commonly asked mpiexec questions.

The latest version of the README included with the distribution is also available.


Mailing list

There is a mailing list for mpiexec.

Archives of the mailing list are available for browsing.

Send mail to email address with comments, questions, and bug fixes. Be sure to send only plain text mails to the list, no pure HTML or multipart text plus HTML, please.

Subscribe to the listusing the standard mailman subscription management form.

Posts from subscribers are relayed immediately to all list recipients; however, posts from non-subscribers are moderated to avoid SPAM and thus may be delayed until someone gets around to approving the mail.


Changes

Changes from 0.82 to 0.83

It has been 15 months since the last release. This overdue release has just a few compilation and bug fixes, and a bit of new support.

Add support for Portals, as implemented in userspace on TCP. It is very unlikely anyone will use this, but it is a good recent example of how to support a new communication device in mpiexec. While Portals is in use on some of the biggest machines around, such as the XT3 at ORNL and other sites, those platforms have their own MPI launch mechanism. The support for Portals here is for the TCP implementation that is used mainly in testing Portals codes targeted to the larger machines.

Add suport for mvapich 0.9.9 and 1.0 beta. Each of these last two MPICH on IB releases changed the startup protocol. Jan Ploski was intstrumental in the first of these, which added an entire extra communication phase with another round of socket close/accept for each compute process. This could be very slow. Support for the expected 1.0 was implemented by Frank Mietke. It is similar but adds some more data transfers and thankfully avoids the second round of socket accepts.

Force configure-time selection of a default communication device. Now you must configure using "--with-default-comm=mpich2" or similar. In previous releases, mpiexec would choose a default of GM. This is no longer likely to be useful, and led to confusion. The mpiexec binary will still support all MPI libraries that it knows about through the --comm argument or MPIEXEC_COMM environment variable, as before. The configure error message hopefully leads one to make a good guess without too much head-scratching.

A few bug fixes and compile problems.

Changes from 0.81 to 0.82

There are four interesting feature additions in this release, along with the usual collection of bug fixes.

Intel sells an MPI library that is based on MPICH2 from ANL, and their latest version 3 adds extensions to the PMI startup protocol used by mpiexec and other job launchers to communicate with the MPI tasks. Thanks to documentation from Intel and a patch from Thomas Zeiser, mpiexec works with Intel MPI version 3 as of this release.

As multi-core processors and large SMPs become more prevalent, issues related to the allocation and scheduling of processing tasks and memory regions become important to achieving good performance. Past mpiexec versions did not distinguish one CPU from another on a given node, where node is defined as a single machine in the PBS sense. This release adds code that is careful to track individual per-CPU identifiers as given from PBS. While users will not notice this change, future support in Torque for cpusets or other placement mechanisms will take advantage of this feature.

Also useful for large SMPs is the new command-line switch '-npernode', which is a generalization of '-pernode' that places no more than a given number of tasks on a single node. This idea and patch are also from Thomas.

The -transform-hostname feature now works on mpich2/pmi, thanks to prodding by Brad Settlemeyer, meaning you can cause your MPI program to use a separate ethernet interface for message passing than what PBS uses.

Some minor enhancements:

Shell-style comments starting with '#' are now permitted in config files.

The status summary printed by mpiexec when it exits is more careful to distinguish among the possible failure cases, including when tasks were not started due to previous failures. It does, however, no longer complain when exit statuses were not received from tasks terminated by PBS due to going over the walltime allocation, for example. This was too misleading and apparently difficult to fix it Torque.

Explain more in the runtests.pl test script about hangs caused by buggy MPI versions. Failing to implement MPI_Abort properly (or at all) is a common error.

Numerous bug fixes and compile problems.

Changes from 0.80 to 0.81

This release ended up having lots of changes. It was a long 9 months ago when the previous release happened, so perhaps that is not too surprising.

Startup for GM (or MX) and InfiniBand is now asynchronous, meaning that mpiexec will spawn tasks and pay attention to ones that are starting up at the same time. This greatly increases the speed for large systems, and avoids timeouts in newly created clients. The largest reported machine using this work is an 8000-ish processor InfiniBand cluster at Sandia.

Code was added to support modifications to the startup protocol by mvapich, an MPICHv1 on InfiniBand library. However, the latest mvapich version 0.9.7 does not work with mpiexec. See the next news item below.

Support was added for Myrinet's new message passing protocol, MX. The Myricom developers were nice enough to make MX look a lot like GM as far as mpiexec is concerned, so they are both supported in the same code.

Support for MPI2 process management features was added. You can now call MPI_Spawn and have mpiexec add more processes dynamically to your job. This works with the PMI interface used by MPICH v2 from ANL and vendor releases based on that code. Other MPI2 features such as name publishing are supported too.

Some fixes for PBSPro issues were added, to work around both syntactic and semantic changes in the PBSPro version of the TM and PBS interfaces. One nice new feature for people using PBSPro is the redirection helper. It enables the use of stdio redirection without assistance from PBSpro. If you configure with --enable-pbspro-helper, a second binary will be built and installed. Mpiexec launches this code on each compute node; it takes care of connecting the stdio sockets back to mpiexec, then starts your MPI task.

The source code repository is now SVN, not CVS, mainly due to the good support and encouragement of HPC system staff at OSC.

An assortment of little bug fixes, code cleanups, and compiler warning suppressions for various systems were added.


Notes about PBS

Sufficiently recent versions (>= 2.3.11) of PBS require the included patch to be applied if you want the standard streams handling functionality to work. It's quite handy stuff to be able to redirect your input and output anywhere, not just to the magic PBS hiding place. To do that, you'll need the source to PBS, and will have to recompile. Things do work just fine with a stock version of PBS, but you will have no stream redirection.

All references to PBS on this page refer to "OpenPBS" available from Veridian Information Solutions at the OpenPBS site. In particular, the latest known version with which we're comfortable is 2.3.15, but changes to OpenPBS are so rare and minimal that it is highly likely that the latest mpiexec will work with a newer OpenPBS. You must apply the patch included in mpiexec to be able to use input and output redirection. For the latest versions of everything, this patch is patch/pbs-2.3.12-mpiexec.diff, as described in the README.

PBS has a long history, having been initially developed in the public domain by the United States government at NASA Ames Research Center and Lawrence Livermore National Laboratory. Veridian and others continued development, then Veridian renamed it in the year 2000 to OpenPBS, to differentiate it from their non-free version called "PBSPro". We don't use the latter, but mpiexec might work with that version.

There have been two reports so far that PBSPro does not work correctly with mpiexec, though, but there are also plenty of success stories too. Note! Recent information suggests that the PBSPro distributions are faulty in that the executable pbs_demux is not part of the "client" RPM (at least on linux). If you have compute nodes with their own file systems, and you find that they do not have the code pbs_demux installed, try to copy it by hand from the "server" RPM distribution. Thanks to Stefan Parnell for figuring this out. Any more advice, confirmation, or the results of somebody invoking his PBSPro software maintenance contract would be appreciated information to add here. Note 2! Complaints from get_hosts like the following:

mpiexec: Warning: get_hosts: ncpus=2 but nodect=2, pretending nodect=1.

are fixed in the CVS as of February 4.

A branch of OpenPBS is under active development and funded by the Department of Energy. It is called Torque and is available from supercluster.org. The developers have been very good about accepting patches. Note that the PBS patch included with mpiexec will not apply cleanly to Torque version 1.0.1p6. You should expect to have to fix the rejects by hand in an editor.


Notes about MPICH

Notes for mpich/p4

You must select whether you plan to use shared memory with MPICH/P4 when you compile the mpich library. To use shared memory, add the configure option "--with-comm=shared" when you build mpich.

Then when you configure mpiexec, if you have added that option to the mpich build, it is not necessary to do anything. However, if you choose not to build mpich/p4 to use shared memory, you should add the flag "--disable-p4-shmem" here. Note that you must make sure that mpich and mpiexec are compatible in this regard or applications will not start.

There is more information on this topic in the README, along with information on a command-line option to change the shmem setting in mpiexec for testing.

If you have a very old mpich, before version 1.2.4, you will need to apply a patch to your mpich distribution before it will work with mpiexec. The patch is included with mpiexec and described in the README.

Notes for mpich/mx and mpich/gm

Mpiexec works with GM, GM2, and MX.

Very old versions of MPICH/GM from Myricom (before 1.2.4..8) will not work with modern mpiexec. Version 0.69 is the latest that will support such old MPICH/GM.


Too many mpiexecs

The MPI-2 specification suggested in 1997 that the name mpiexec be used by implementations that provide a mechanism to initialize a parallel program. They specifically do not suggest mpirun because that name was widespread in existing practice, in non-standard and non-portable ways, and the MPI Forum did not want to confuse matters.

The existence of this name in any specification was not really a problem as most MPI implementations happily ignored it and continued with their existing mpirun scripts. Now, though, fast forward to 2005 where a MPICH distribution that implements features of MPI-2 begins to see some popularity. The MPICH2 distribution includes six other parallel code launcher programs and scripts, all called mpiexec.

The pages you are reading here discuss the version of mpiexec that was designed specifically to start parallel MPI codes in PBS environments. It has almost nothing to do with these other six versions that are shipped in the MPICH2 distribution. If you are planning to use the mpd that comes with MPICH2 and this version of mpiexec, things will not work. Take a look at the MPICH2 documentation to understand how to use their mpiexecs. Send bug reports to mpich2-maint@mcs.anl.gov, and see the documentation at the MPICH2 page.

If, however, you use the PBS resource manager and would like to take advantage of the features provided by the PBS mpiexec discussed above, support is included for the MPICH2 library. Specify the flag --comm=pmi on the command line (or use configure to make that the default at build time) to launch your MPICH2 executable. See the manual page and included README for more information. Send mail to the list if you run into any problems.


Cute mpiexec hacks

You can do funny, and sometimes useful, things with mpiexec that are not immediately obvious.

RSH replacement

Running tasks using rsh or ssh is bad, for all the reasons given above, and is one of the big reasons why we use mpiexec in the first place. But what if you have non-modifiable codes that really expect to be able to use rsh? It should be possible to fool them into using mpiexec. Here's some hints; if anyone comes up with a nice set of wrapper scripts to do this well, please share. It would be interesting to write a program/script called "rsh" that does the right thing, too, parsing rsh arguments, running commands, and starting up an mpiexec server if necessary too.

Basic rsh: run one task on one node.

echo 'opt0600: hostname' | mpiexec --comm=none -nostdin -config=-

Feeding the config file on stdin requires that the task not read from stdin, which is what "rsh -n" does. But if this isn't the case, put the config in a temporary file somewhere and pass that file as the argument to --config=.

If you need to run more than one remote rsh at a time like this, you'll need to use mpiexec's server mode. At the beginning of the job, start up one instance of:

mpiexec --server &

Then you can spawn off as many instances of mpiexec as above, and leave them in the background until they finish.

echo 'opt0600: sleep 10' | mpiexec --comm=none -nostdin -config=- &
echo 'opt0601: hostname' | mpiexec --comm=none -nostdin -config=-

At the end of the job, kill off mpiexec, or just let it be killed by the batch system when the main script exits.

File distribution

Frequently the problem arises of how to move a file from the master node of the PBS job out to all the worker nodes. People tend to write little scripts to loop through the contents of $PBS_NODEFILE and invoke an rcp to each one. This does the same thing, but in parallel.

cd $working-dir
echo some stuff > file.src
mpiexec --allstdin --comm=none --pernode cat \> file < file.src

Note importance of escaping the backslash so that it gets evaluated by the shell on each compute node, not by the shell in which mpiexec is invoked. Also notice that input-file is different from input-file.src otherwise you'll end up writing over the same file from which you're reading on the master node, ending up with a zero-length file on all machines.

Xterm debugger

To start a separate terminal for each debugger process, you might use

mpiexec xterm -e gdb mycode

with the caveat that this only works for devices which pass information by environment variables, not by command-line arguments. In other words, use the above for anything but MPICH/P4. Each process will start in a separate terminal, and obviously you must ensure that X clients can reach the server on your desktop. (Perhaps you need to use the argument -v DISPLAY when starting an interactive PBS jobs with qsub -I?)

A variation which works for MPICH/P4 is the following:

mpiexec xterm -e gdb --args mycode

This tells gdb that all the magic MPICH arguments should be ignored, and mpiexec arranges to place them last on the command line. Then the --args switch to gdb says to interpret all the rest of the line as arguments to the debugging target.

Note that with P4 you will see only a single xterm for process zero, then as you use the debugger to run the code to MPI_Init, all the rest of the windows will then pop up and can be manipulated.

Poor-man's parallel debugger (thanks to Troy Baer)

In an interactive job, you sometimes have the failure mode where one (or more) processes die. This replicates your typed input to a gdb around each process so that if one stops you can at least type where to get a backtrace.

mpiexec -np 4 -allstdin gdb mycode

You'll get 4 identical gdb prompts, and anything typed to one will be brodcast to all. This does not work with mpich/p4, however, as it requires special startup order and command-line arguments.

You can always use a config file to put the debugger on only some of the nodes:

mpiexec -np 4 -allstdin -config conf

where file conf contains something like:

-n 1 : mycode
-n 1 : gdb mycode
-n 2 : mycode

to debug only the process with rank 1 out of 4 total.


Detailed ChangeLog

Last two years or so.

mpiexec-0.83 22 Feb 2008
* ib.c: Add suport for version 6 startup in mvapich 1.0. Patch provided by Frank Mietke.
* event.c: On a "remote system warning" from tm_poll, look up the event structure to be able to point at the broken node.
* README: Explain mpich2/smpd unusability. Update PBSPro notes to point out that version 8 does not work with mpiexec.
* configure.in configure: Force selection of a default comm device.
* README: Explain this required option a bit more.
* Makefile.in: Fix for recent autoconf.
* ib.c: Support version 5 two-phase startup protocol for mvapich 0.9.9. Inspiration and testing by Jan Ploski.
* get_hosts.c: Correct i -> j index problem. Should not have caused any errors, just perhaps slower and obviously wrong. From Eygene Ryabinkin.
* mpiexec.h mpiexec.c configure.in configure config.h.in start_tasks.c: Add basic support for Portals, at least using userspace TCP NAL. Also a good example of how to add a new device in mpiexec.
* config.c mpiexec.c: Fix two old bugs found by Thomas Svedberg. The interesting one only affect mpich-p4/shmem systems with ppn > 2.
* README: Rearrange pbs notes a bit. Add FAQ for missing pbs_iff.
* list.h: PGI compiler does not know typeof. Bug found by Filippo Spiga.
* gm.c util.h task.c list.h event.c ib.c: Hacks and fixes for Cray C compiler.

mpiexec-0.82 28 Nov 2006
* get_hosts.c mpiexec.h mpiexec.c mpiexec.1: Implement -npernode generalization of -pernode, by Thomas Zeiser.
* start_tasks.c: Do not track startup_complete for COMM_NONE jobs. Bug found by Thomas Zeiser.
* runtests.pl: Add a warning about bad mpich1/p4 behavior, and update for new output strings.
* contests.pl: Fix obvious bug, initialize spid for debugging.
* mpiexec.h: Track individual TM node ids.
* get_hosts.c: Parse host data in two passes, to track TM node ids and to use fewer string comparisons.
* config.c spawn.c: Set the cpu_index on each task as it is created.
* concurrent.c: Pass TM node ids to clients. Do not exit on SIGPIPE. Be a bit more careful with the nodealloc lock.
* start_tasks.c: Spawn on particular TM node id.
* task.c: Look at individual TM node ids to get hostname.
* event.c: Warn on remote system error, see what turns up.
* mpiexec.c: Track CPU indexes. Look at sigaction return value.
* mpiexec.h get_hosts.c concurrent.c spawn.c pmi.c mpiexec.c: Rename numcpu in preparation for tracking individual TM node ids.
* mpiexec.c: When no exit statuses are obtained, do not complain.
* runtests.pl: Be a bit more precise in explaining the mpich2 MPI_Abort problem.
* pmi.c mpiexec.c spawn.c mpiexec.h: Add support for get_ranks2hosts PMI command used by Intel MPI version 3. Initial patch by Thomas Zeiser.
* get_hosts.c task.c: The field tasks[i].done tasks on a range of values, not just true/false.
* runtests.pl: Add a warning explaining that mpich2 MPI_Abort is broken.
* stdio.c: Likely bug fix when stdio is closed.
* start_tasks.c: Make -transform-hostname work on MPICH2, thanks to Brad Settlemyer for the prompting and testing.
* mpiexec.1: Document this and explain better.
* contests.pl: Be a little more verbose about expectations; check for working /bin/true and false.
* pmi.c: Add some warnings about PMI name publishing problems.
* README: Fix typo.
* README: Update open file limit text a bit.
* mpiexec.h start_tasks.c exedist.c mpiexec.c: Add DONE_NOT_STARTED state to print better messages at task exit time.
* task.c: Avoid killing tasks that are not running, although harmless.
* README: Add mpich/p4 vs mpich2/pmi FAQ item. Rearrange TODOs.
* runtests.pl: Catch new startup-incomplete message.
* get_hosts.c: Fix -nolocal -pernode as reported by Chris Maestas.
* runtests.pl: Add a test to check for this, and make sure it works.
* stdio.c: Update comment hinting at how Torque compile may cause a weird tty symptom on Mac.
* config.c mpiexec.1: Accept # comments in config spec. Rearrange a bit to avoid a loop. From Cray.
* stdio.c: Finally get rid of PRINTF macro. Fix a potential negative pid race fix, from Cray.
* exedist.c mpiexec.c start_tasks.c: Minor cleanups from Cray.

mpiexec-0.81 19 Apr 2006
* stdio.c: Clear returned events for new fds. Convert some PRINTF to debug. Maybe optimize a bit by checking n before looping over everything. Work around a potential Mac bug related to polling on tty stdin.
* gm.c ib.c: Fix off-by-one error in select for --disable-poll case.
* mpiexec.h stdio.c pmi.c: Pass rfs rather than use global, avoids collision with rfs use in ib.c and gm.c.
* start_tasks.c gm.c mpiexec.h: Implement asynchronous GM startup.
* stdio.c: Fix bug found by Garrick.
* configure.in configure: Bump version.
* runtests.pl: Try again if qsub fails, to work around slow PBS servers.
* README: Add FAQ entry for running out of sockets. Thanks to David Golden for this one.
* mpiexec.h util.c: Remember directory name from where mpiexec was started, if available.
* config.c spawn.c: Add extra 0 argument to resolve_exe call.
* start_tasks.c mpiexec.c: Look for redir helper in same directory as mpiexec. Thanks to Thomas Zeiser for the suggestion.
* configure.in configure: Bump to pre4.
* get_hosts.c: Remove extra attributes on hostnames added by PBSPro.
* Makefile.in configure.in configure: Refine use of torque pbs-config a bit; move specific library names out of Makefile.
* get_hosts.c: Accommodate PBSPro TM API difference in node versus CPU counting.
* mpiexec.spec mpiexec.spec.in: Rename to auto-generate this file.
* configure.in configure: Auto-gen mpiexec.spec. Bump version for a prerelease.
* Makefile.in: Remove spec version check, auto-generate instead.
* mpiexec.c: Remove bogus hacked-in Version strings.
* redir-helper.c: New file, to work around PBSPro lack of redirection.
* Makefile.in: Compile this new redir-helper code.
* get_hosts.c: Fix exec_host syntax better.
* configure.in configure config.h.in: Make redir-helper a configure-time option.
* start_tasks.c: Fork mpiexec-redir-helper in front of the actual executable.
* README: Some documentation.
* mpiexec-redir-helper.1: New file, man page for the new code, although it is not for users to run directly.
* get_hosts.c: Account for new exec_host syntax in PBSPro, thanks to Doug Johnson.
* configure.in configure: Add pbs-config check for new torque, from Garrick Staples.
Switch from CVS to SVN repository.
* spawn.c: New file, to support MPI_Comm_spawn.
* Makefile.in: Compile new file. Remind distributor to fix spec file.
* concurrent.c event.c exedist.c get_hosts.c: Add indirection to tasks[].status.
* mpiexec.c: Do hostname lookup here instead of in start_tasks(), now that it can be called multiple times. Indirection for status.
* mpiexec.h: New structure to group tasks by when they were spawned. Some new functions.
* pmi.c: Support multiple keypair spaces. Handle getbyidx command needed for spawning.
* start_tasks.c: Look at start and end indexes in spawns[] rather than going from 0 to numtasks.
* stdio.c: Functions to communicate between stdio listener and parent to handle spawning new tasks.
* util.c util.h: New handy functions to communicate strings.
* config.c: Export a function.
* mpiexec.c mpiexec.h config.c: Always return a new string in resolve_exe to simplify usage.
* pmi.c: Use standard list type for keypairs. Handle mcmd syntax used by spawn. Handle three name publishing commands for MPI2. Parse spawn command but do not do anything with it yet. Check for duplicate keys in put command.
* runtests.pl: Reorganize nicely, add comments on tests that may fail due to bad mpich2 or PBS configuration.
* config.c mpiexec.h: Switch to using standard list.h and get rid of the single static "working" config_spec_t structure.
* hello.c: Quiet a couple of compile warnings.
* concurrent.c: Fix compilation on rh73, thanks to Chris Samuel.
* stdio.c event.c task.c mpiexec.c mpiexec.h concurrent.c: Introduce a pipe between the main process and the stdio lister. Adapt some mechanisms to use it instead of signals.
* gm.c start_tasks.c: Move abort_fd handling into gm instead of start_tasks for better symmetry.
* ib.c: Rename abort_fd function.
* concurrent.c mpiexec.h: Release client properly after last event. Return number of clients terminated to main when killing all.
* mpiexec.c: Exit zero if -server and no connected clients when signalled, as suggested by Martin Schafföner.
* contests.pl: Test this new behavior.
* concurrent.c: Propagate concurrent client return code properly.
* contests.pl: Test this bug, found by Martin Schafföner.
* gm.c: Remove duplicated code bug from nonblock checkin (never released). Make error and debug lines mention gm and mx.
* README mpiexec.1: Add documentation that GM and MX are similar.
* configure.in configure: Fix spacing, allow mx or gm.
* mpiexec.c: Allow mx as well as gm.
* start_tasks.c: Set env vars for mpich/mx in COMM_MPICH_GM too. Thanks to Denis Charland for doing the initial port.
* util.c: Add commented-out code to include a timestamp in each debugging output line.
* event.c: Move start event processing code into generic dispatch.
* start_tasks.c mpiexec.h: Handle start events while spawning, poke ib service while waiting for all start messages.
* ib.c: Bail if something dies in start event handler while waiting for accept.
* start_tasks.c mpiexec.h mpiexec.c event.c: Report if tasks exited before MPI startup was complete.
* ib.c: Be more asynchronous, check more error codes.
* mpiexec.1: Update out-of-date diagnostic.
* util.c util.h: Add error-returning version of read_full.
* start_tasks.c mpiexec.h mpiexec.c: Return any error code from task startup and kill all if so.
* ib.c: Remove duplicated \n
* task.c: Include progname and redo a bit.
* hello.c: Do not time out MPI startup so quickly.
* task.c: Do not print all hostnames, just the first couple and a summary.
* ib.c: Use debug() so printfs go to stderr and to centralize debug level checks.
* concurrent.c: Include header for AIX, thanks to Chris Samuel.
* config.c: Fix off-by-one errors regarding processor assignment.
* runtests.pl: Add -config vs -np tests. Revert debugging code.
* gm.c: Force new socket to be non-blocking.
* ib.c start_tasks.c: Support new startup protocol in mvapich >= 0.9.5-112. Note new MPIRUN_PROCESSES env var scales poorly.

Last modified: Mon, 21 Apr 2008 19:39:41 +0000