MPICH: Troubleshooting the MPD
From Debian Clusters
This is part five of a multi-part tutorial on installing and configuring MPICH2. The full tutorial includes
- Installing MPICH
- MPICH: Pick Your Paradigm
- MPICH without Torque Functionality
- MPICH: Starting a Global MPD Ring
- MPICH: Troubleshooting the MPD
Missing .mpd.conf
If you try to start a daemon (using mpd) without an .mpd.conf file or with incorrect permissions on the file, , you'll get the following error.
gyrfalcon:/shared$ mpd configuration file /shared/home/kwanous/.mpd.conf not found A file named .mpd.conf file must be present in the user's home directory (/etc/mpd.conf if root) with read and write access only for the user, and must contain at least a line with: MPD_SECRETWORD=<secretword> One way to safely create this file is to do the following: cd $HOME touch .mpd.conf chmod 600 .mpd.conf and then use an editor to insert a line like MPD_SECRETWORD=mr45-j9z into the file. (Of course use some other secret word than mr45-j9z.)
The error is explicit about what needs to be done to create an mpd.conf file. If you're running this as a user, create ~/.mpd.conf, or create /etc/mpd.conf for the root account. Create some secret word; the word will be used to distinguish your processes from other people's and to keep them separate. This word can be just about anything but standard password requirements (more than six characters long, containing at least one number and at least one letter) help make it more secure. Follow the instructions from the error message to insert this with the proper syntax and to change the permissions on the file. (If you don't change the permissions, you'll see something like
gyrfalcon:~$ mpd configuration file /shared/home/kwanous/.mpd.conf is accessible by others change permissions to allow read and write access only by you
Start mpd as a daemon in the background using
mpd --daemon
Without these arguments, on some systems you'll see an error like this:
kwanous@gyrfalcon:~$ mpd gyrfalcon_53084 (mpd_sockpair 226): connect -2 Name or service not known gyrfalcon_53084 (mpd_sockpair 233): connect error with -2 Name or service not known
Missing Root's /etc/mpd.conf
Sometimes you'll see an error like this:
osprey:~# mpdtrace -l /shared/bin/mpdroot: open failed for root's mpd conf filempdtrace (__init__ 1171 ): forked process failed; status=255
You'll get this message when running as root if /etc/mpd.conf/, root's version of ~/.mpd.conf, doesn't exist. Use the same syntax (show in the error above) for creating the root version as for creating a user version. Once you create it, if you don't change the permissions to only be readable by root, you'll see a more helpful error:
osprey:~# mpdtrace -l configuration file /etc/mpd.conf is accessible by others change permissions to allow read and write access only by you
Use chmod 600 /etc/mpd.conf to do this and it should work.
Python Error
As of this writing, mpd (the MPI daemon) is a python program and requires the python 2.4 binary in order to run. If you don't have python installed on the machine you're trying to use MPI with, you'll see an error like this:
eagle:~# /usr/bin/env: python2.4: No such file or directory
Fortunately, it's easy enough to fix. All the hosts you're going to use MPI with need to issue
apt-get install python2.4
If this still doesn't work, try uninstalling all versions with
apt-get remove --purge python2.4 python
running
apt-get autoremove
and then finally running the apt-get install again.
If you need to do this on all of your nodes, rather than sshing into each one and doing it individually, check out the Cluster Time-saving Tricks.

