xCAT Myrinet HOWTO

The purpose of this document is to describe the setup of Myrinet.  This document assumes familiarity with xCAT 1.2.0 and the Building a Linux HPC Cluster with xCAT Redbook.

xCAT's method of setting up Myrinet is very similar to default method as documented in Myrinet's README-linux, with the following exceptions:

What is Myrinet?

http://www.myri.com

Prepare Kernel Tree (for 2.4 kernels)

  1. Install the matching kernel source for the running kernel.
     

  2. Create .config

    For SuSE type:

    cd /usr/src/linux
    make mrproper
    gunzip -c /proc/config.gz >.config

    For Red Hat type:

    cd /usr/src
    ln -s linux-2.4 linux
    cd /usr/src/linux
    make mrproper
    cp configs/
    (matching config) .config
     

  3. Edit Makefile and validate that:

    VERSION = 2
    PATCHLEVEL = 4
    SUBLEVEL = 19
    EXTRAVERSION = -SMP
    (SuSE leave blank)

    equals uname -r, e.g.:

    2.4.19-SMP
     

  4. Prepare kernel.  Type:

    make oldconfig
    make dep
    make modules
    (ctrl-c after 30 seconds)

    NOTE:  For SuSE SLES8 ppc64 you must install all the *cross* RPMs except gcc-core, then type:

    chmod 755 make_ppc64.sh
    ./make_ppc64.sh oldconfig

    ./make_ppc64.sh dep
    ./make_ppc64.sh modules
    (ctrl-c after 30 seconds)
     

  5. Enjoy your prepared kernel tree.

Prepare Kernel Tree (for 2.6 kernels)

  1. Install the matching kernel source for the running kernel.
     

  2. Create .config

    For SuSE type:

    cd /usr/src/linux
    make mrproper
    gunzip -c /proc/config.gz >.config

    For Red Hat type:

    cd /usr/src
    ln -s linux-2.6* linux
    cd /usr/src/linux
    make mrproper
    cp configs/
    (matching config) .config
     

  3. Edit Makefile and validate that:

    VERSION = 2
    PATCHLEVEL = 6
    SUBLEVEL = 5
    EXTRAVERSION = -SMP
    (SuSE do nothing)

    equals uname -r, e.g.:

    2.4.19-SMP
     

  4. Prepare kernel.  Type:

    make oldconfig
    make prepare-all
    make all

     

  5. Enjoy your prepared kernel tree.

Build GM RPM

  1. Download from ftp://ftp.myri.com/pub/GM gm-1.5.2.1_Linux.tar.gz, gm-1.6.*.tar.gz, gm-2.0.*_Linux.tar.gz, or gm-2.1.*_Linux.tar.gz and save to /tmp.
     

  2. For GM <= 1.5.* type (RH support only):

    cd /tmp
    $XCATROOT/build/gm/gmmaker
    (gm tarball)

    E.g.:

    $XCATROOT/build/gm/gmmaker gm-1.5.2.1_Linux.tar.gz


    For GM 1.6.* type (RH and SuSE support only):

    cd /tmp
    $XCATROOT/build/gm-1.6/gmmaker
    (gm tarball)

    E.g.:

    $XCATROOT/build/gm-1.6/gmmaker gm-1.6.3_Linux.tar.gz


    For GM 2.0.* type (RH and SuSE support only):

    cd /tmp
    $XCATROOT/build/gm-2.0/gmmaker
    (gm tarball)

    E.g.:

    $XCATROOT/build/gm-2.0/gmmaker gm-2.0_Linux.tar.gz


    For GM 2.1.* type (RH and SuSE support only):

    cd /tmp
    $XCATROOT/build/gm-2.1/gmmaker
    (gm tarball)

    E.g.:

    $XCATROOT/build/gm-2.1/gmmaker gm-2.1_Linux.tar.gz
     

  3. After the RPM is built note the location of the RPM.

Define Myrinet Mapper

Define Myrinet IP addresses

  1. For each Myrinet node define a CLASS B address using the node name suffixed with -myri0 in /etc/hosts, e.g.:

    172.20.100.1    node001-myri0
    172.20.100.2    node002-myri0


    NOTE:  You can use a different subnet if defined in $XCATROOT/etc/site.tab as myrimask, e.g.:

    myrimask    255.255.255.0
     

  2. Type:

    makedns

Install GM RPM on each node

  1. For each node copy the RPM and install with rpm -i

    or

    Copy to $INSTALLDIR/post/kernel ($INSTALLDIR is defined in $XCATROOT/site.tab) and setup a $XCATROOT/etc/postscripts.tab rule to install and setup Myrinet on post install, e.g.:

    NODERANGE=myri {
        myrinet 1.6.3_Linux 2.4.19-SMP
    }

    or

    TABLE:noderes.tab:$NODERES:$noderes_gm=Y and OSVER=sles8 {
        myrinet 1.6.3_Linux 2.4.19-SMP
    }

    The myrinet post install script takes two arguments: GM Version and Kernel Version.
     

  2. Non Myrinet nodes that are used for development should also have the same RPM installed, however turn off the gm service:

    chkconfig -d gm (SuSE)

    or

    chkconfig --del gm
    (Red Hat)

Build Static Routes (GM Version < 2.0 only)

  1. From the management node type:

    makegmroutes noderange

    where noderange is the list of nodes that contain Myrinet adapters.  Read the noderange.1 man page for information on crafting a noderange.  You will be prompted to save the routes for reinstalls.  Default is 'y'.

Test

  1. Check routes:

    psh noderange $XCATROOT/sbin/gmroutecheck noderange

    No news is good news.
     

  2. Check IP, ssh to first Myrinet node and type:

    pping -i myri0 noderange

    Then type:

    ppping -i myri0 noderange

    No news is good news.
     

  3. Clear counters and run GM stress:

    psh noderange /opt/gm/bin/gm_counters -C
    pgmstress
    noderange 1024

    This will start the gm_stress test program on all nodes. You should now be able to look at the Myrinet switches and see all of the green lights blinking.

    After gm_stress has run awhile, you can look for problems by:

    psh noderange /opt/gm/bin/gm_debug | grep bad

    If any of the counters are a non-zero that node is worthy of further testing.

    NOTE: If you kill pgmstress you will need to run:

    psh noderange killall gm_stress
     

  4. Run mute check for bad cables, adapters, and ports.

Mute

Mute is a Myrinet tool to analyze hardware problems while gm_stress is running.  http://www.myri.com for details on obtaining, building, and running Mute.

Obtain m3-dist.tar.gz and mute-1.9.5.tar.gz, place in /tmp and then type on the same node you ran gmmaker:

/opt/xcat/build/gm-1.6/mutemaker m3-dist.tar.gz mute-1.9.5.tar.gz

OR

/opt/xcat/build/gm-2.0/mutemaker m3-dist.tar.gz mute-1.9.5.tar.gz

After build, create an EMPTY directory, then create the file mute.switches in the newly created directory and enter one line for each Myrinet switch IP, then run mute from that directory.

NOTE:

GM 2.0.x/2.1.x NOTES:

Mute Q&D:

Illustration: mute monitoring pgmstress.


Support

http://xcat.org


Egan Ford
egan@us.ibm.com
January  2005

Dan Cummings
dcumming@us.ibm.com
December  2002