xCAT Mini HOWTO (WIP)

This document is for xCAT 1.2.0.

x86 (i386, i486, i586, i686) supported distributions:

x86_64 (Opteron and EMT64) supported distributions:

IA64 (Itanium 1 and 2) supported distributions:

PPC64 (IBM JS20 only) supported distributions:

* Node install tested only, however should work as management node.

This HOWTO is for xCAT experts.  Please be very familiar with the xCAT 1.1.0 Redbook.

NOTE:  The term noderange refers to xCAT's internal facility to perform an operation on a range of nodes, please read the noderange.1 man page for details.

  1. Install management node.

    Install complete OS (all packages).  i.e. ALL PACKAGES.  Repeat ALL PACKAGES.  Life is too short and disk space too cheap not to install all the packages.  The top issue with xCAT is missing packages.  xCAT is written in scripts so it is difficult to list all the dependencies and they change frequently.  I have been asked many times for a list of dependencies.  So, here they are:

    ALL PACKAGES

    HINT:  With RH package selection, select "Custom", then scroll down to the bottom and select "Everything".

    HINT:  SuSE does not have an "Everything" option, so you must manually select all package groups.  After selecting all package groups, you will not get all the packages.  The most common missing packages are: expect and pdksh.  You can select expect and pdksh from the top right window pane during install.  HOWEVER, do not manually select all packages in the right window pane.  Confusing huh?


    Please read the susemgtnode-HOWTO for more info on SuSE installs and the xCAT 1.1.0 Redbook and the xCAT HOWTO for Red Hat installs.

    IANS:  Install all packages and updates and have large /install and /var file systems.

    NOTE:  A few words about Java:

    For some ASMA, RSA, RSA2, and Bladecenter functions xCAT uses IBM's mpcli and mpcli2 utilities (included in the xcat-dist-ibm tarball).  Both utilities require Java.  The Java included with both tools only work with older RH x86 distributions.  SuSE includes a functioning Java for all four xCAT supported architectures (x86, x86_64, IA64, and PPC64) and has been tested.  However, RH does not provide a functioning Java.  If you wish to install or use a different Java, just install and create a link to $XCATROOT/java/$ARCH, where $ARCH = x86, x86_64, ia64, or ppc64.  E.g.:

    Install IBM Java in /usr/ibm/java
    cd /opt/xcat/java
    ln -s /usr/ibm/java x86
    ls -l
    total 1
    lrwxrwxrwx    1 root     root           13 Jul 21 18:19 x86 -> /usr/ibm/java

    Some good Java for x86, x86_64, and PPC64:

    https://www6.software.ibm.com/dl/lxdk/lxdk-p

    Java for IA64?  If you find a good one one let me know.  SuSE includes it.  BTW the x86 versions will run on IA64 natively--slow--but works OK for systems management.
     
  2. Extract xCAT tarballs in /opt.

    cd /opt
    tar zxvpf /tmp/xcat-dist-core-1.2.0.tgz
    tar zxvpf /tmp/xcat-dist-oss-1.2.0.tgz

    tar zxvpf /tmp/xcat-dist-ibm-1.2.0.tgz
     
  3. Setup xCAT.

    export XCATROOT=/opt/xcat
    cd $XCATROOT/sbin
    ./setupxcat

     
  4. Logout and login is as root.
     
  5. Enable time services (xntpd) on management node.

    mv -f /etc/ntp.conf /etc/ntp.conf.ORIG

    Create a new /etc/ntp.conf:

    server 127.127.1.0
    fudge 127.127.1.0 stratum 10
    driftfile /etc/ntp/drift


    Red Hat:  Set time, date, and time zone with setup:

    setup OR date
    setclock
    OR hwclock -w
    chkconfig --level 345 ntpd on
    service ntpd restart


    SuSE: Set time, date, and time zone with yast:

    yast
    OR date
    clock -w
    chkconfig -a xntpd
    rcxntpd restart

    Test (NOTE: it can take a few minutes before xntpd is working), type:

    ntpdate -q localhost

    If working you should receive the following output:

    server 127.0.0.1, stratum 2, offset -0.000002, delay 0.02570
    22 Jan 08:04:24 ntpdate[14540]: adjust time server 127.0.0.1 offset -0.000002 sec

    If not working you will receive the following output (try again later or fix):

    no server suitable for synchronization found

     
  6. Define Cluster.  Define /opt/xcat/etc/* (look to /opt/xcat/samples/etc/*).  Read the xCAT 1.1.0 Redbook for more information.  HINT:  Everything is a node.  Every node, switch, terminal server, node NIC, EVERYTHING is a node in xCAT.

    Required tables:

    site.tab
    nodehm.tab
    nodelist.tab
    nodepos.tab
    noderes.tab
    nodetype.tab
    passwd.tab
    postscripts.tab
    postdeps.tab
    snmptrapd.conf
    networks.tab
    mac.tab
    (loaded with non-collectable MACs, e.g. terminal servers, switches, RSAs, etc...)

    Required tables for clusters with terminal servers or SOL (Server Over Lan):

    conserver.tab
    conserver.cf


    Required tables for clusters using Ethernet switches to collect MAC addresses (use the correct table for your switch):

    cisco.tab
    summit48i.tab
    blackdiamond.tab

    Required tables for IBM xSeries Management Processor:

    mpa.tab
    mp.tab


    Required table for APC Master Switch:

    apc.tab

    Required table for APC Master Switch Plus:

    apcp.tab

    Required table for xCAT flash support:

    nodemodel.tab

    Required table for EMP support:

    emp.tab

    Required table for Baytech support:

    baytech.tab

    Required table for xCAT GPFS support:

    gpfs.tab

    Table for IPMI support.  Required for systems that have a different IPMI IP address than node address (e.g. e325):

    ipmi.tab
     
  7. Rerun setup xCAT again.  NOTE:  site.tab must be properly setup for xcatd.  Please review the samples/etc/site.tab.

    export XCATROOT=/opt/xcat
    cd $XCATROOT/sbin
    ./setupxcat

     
  8. Configure all nodes.  Please read the stage1-HOWTO for more information.

    IANS:

    Update all firmware to latest levels.
    Configure firmware/BIOS/CMOS to NEVER prompt or pause for anything.
    Configure network to boot before HD.
    Enable management processor.
    Redirect POST/BIOS out serial if possible.

    NOTE:  For Bladecenter use rbootseq, e.g.:

    rbootseq noderange c,f,n,hd0
     
  9. Define /etc/hosts.  For each node define an IP and a name, for each interface other than the primary interface (e.g. eth0) define an IP and name with a -interface suffix.

    E.g.: for eth0 as the primary interface.

    192.168.1.1    node001
    172.20.1.1     node001-eth1
    10.10.1.1      node001-eth2
    172.30.1.1     node001-myri0

    E.g.: for eth1 as the primary interface.

    192.168.1.1    node001-eth0
    172.20.1.1     node001
    10.10.1.1      node001-eth2
    172.30.1.1     node001-myri0

    NOTE: Do NOT pad IP address (e.g. 172.020.001.001)--that's insane.

    NOTE:  This naming convention for multiple NICs must be strictly adhered to.  Your node names can follow any naming convention, but the interface suffix must be as illustrated  in the above examples.

    NOTE:  There is little value in adding additional entries for the fully qualified domain name.  More often than not it creates confusion and problems.  Let DNS do it for you.  EXCEPTION:  The each node's (master too) /etc/hosts table should have the FQDN for that nodes entry.
     
  10. Build a DNS server (this is not an option):

    makedns
     
  11. Enter non-collectable MACs in $XCATROOT/etc/mac.tab.  (E.g. terminal servers, switches, RSAs, etc...)

    NOTE:  Some network devices (e.g. APC Master Switch) do not have the MAC address affixed to the unit.  Some (e.g. APC Master Switch) have the MAC printed on a piece of receipt paper and stuffed in the manual.  Hopefully you didn't install all the APCs and chuck the manuals in a pile somewhere.  The morale of this story is that before you rack anything please verify the that MAC address is visible and will be visible when racked.  Very cool network devices (e.g. APC Master Switch and RSA) have a serial port, you can use this to get the MAC.

    NOTE:  Manual non-collectable MAC entries in mac.tab do not require a -eth0 appended--it's optional.
     
  12. Build a DHCP server:

    makedhcp --new --allmac

    NOTE:  The dhcpver field in $XCATROOT/etc/site.tab must be set to match the version of dhcpd installed.  Generally 2 for older Red Hat and 3 for SuSE and newer Red Hat before you run makedhcp.  If incorrect, correct and rerun makedhcp --new --allmac.

    NOTE$XCATROOT/etc/networks.tab must define each network that dhcpd is to support.  Let makedhcp build it for you the first time, edit and rerun makedhcp --new --allmac.
     
  13. Configure all Ethernet switches, please block DHCP in and out bound on ports that are used to uplink the cluster to the real world.  Please read the xCAT 1.1.0 Redbook, the cisco2950-HOWTO, and the force10-HOWTO for more information.
     
  14. Configure all Terminal Servers.  Please read the terminalserver-HOWTO.
     
  15. Restart conserver (only if using terminal servers or SOL, Bladecenter without SOL do not use conserver):

    RH:

    service conserver restart

    SuSE:

    rcconserver restart
     
  16. Setup stage boot image:

    For x86 and x86_64 type:

    cd /opt/xcat/stage
    ./mkstage


    For ia64 type:

    cd /opt/xcat/stage
    ./mkstage-ia64

     
  17. Manually reboot each node.  Collect MAC addresses:

    getmacs noderange

    E.g.:

    getmacs compute

    node1-eth0 00:07:E9:93:F8:DD
    node1-eth1 00:00:5A:9A:DB:7C
    node2-eth0 00:07:E9:93:F8:DD
    node2-eth1 00:00:5A:9A:DB:7C
    ...

    Auto merge mac.lst with /opt/xcat/etc/mac.tab(y/n)? y


    Each node will be suffixed with the interface of the collected MAC.  Please do not alter.

    NOTE:  Do not alter the mac.tab entries for collected MACs.  It is critical that the stored node names remain untouched.  If necessary changing the MAC is OK.

    NOTE:  Multiple getmacs commands will corrupt mac.tab.  Only run one instance at a time.

    NOTE:  Some OSes report eth0 and eth1 different than xCAT getmacs collect.  You may need to reverse manually in mac.tab.  E.g. (this may hose other good non-switched entries, think before you do :-):

    perl -pi -e 's/(nodeprefix.*)-eth0/$1-ethfoo/' mac.tab
    perl -pi -e 's/(nodeprefix.*)-eth1/$1-eth0/' mac.tab
    perl -pi -e 's/(nodeprefix.*)-ethfoo/$1-eth1/' mac.tab


    NOTE:  Currently only the serial-based (rcons) method of connecting MACs will collect multiple MAC/node.  A future version of xCAT will address this limitation.  EXCEPTION:  Bladecenter mpcli2 and bcmm getmacs methods can collect both MAC addresses.

    NOTE:  For Bladecenter please use bcmm method in nodehm.tab.
     
  18. Build /etc/dhcpd.conf:

    makedhcp --allmac
     
  19. For all IBM xSeries nodes with IBM management processors and the IBM e325/e326 (read managementprocessor-HOWTO for more info):  EXCEPTION:  Bladecenter (just use mpname noderange).

    nodeset noderange stage3
     
  20. Reboot each node manually after all MACs collected and DHCP server restarted.
     
  21. Read the managementprocessor-HOWTO and bladecenter-NOTES for information on testing and troubleshooting all nodes management processors if applicable.
     
  22. Test systems management:

    rpower noderange stat

    and/or

    rbeacon
    noderange on (if blinking lights entertain you -- NOTE:  not all servers have a blinking light.)
     
  23. Copy CDs:

    copycds (follow prompts)
     
  24. Copy xCAT post installation files:

    cd /opt/xcat
    find post -print | cpio -dump /install

     
  25. Generate root SSH keys:

    gensshkeys root
     
  26. Update /etc/exports with /install, restart NFS:

    echo "/install *(ro,async,no_root_squash)" >>/etc/exports

    Red Hat:

    chkconfig --add nfs
    service nfs restart

    SuSE:

    chkconfig -a nfsserver
    rcnfsserver restart

     
  27. Create Myrinet RPM.  (You may need to install a node first.)  Read myrinet-HOWTO.
     
  28. Edit (*.tmpls) to taste.  Read the nodeinstall-HOWTO and systemimager-HOWTO for details.
     
  29. Got disk?  Install nodes.  Use rinstall or winstall.  Only install 32 at a time or use staging.  Read man pages on rinstall and winstall, e.g.:

    winstall -t 8 node001-node032

    NOTE:  Diskless option diskless-HOWTO.
     
  30. Collect SSH host keys after install:

    makesshgkh noderange
     
  31. Test psh verify that the dates match:

    psh noderange date;date
     
  32. Build GM routes (version < 2.0 only):

    makegmroutes noderange

    If you have multiple different Myrinet networks, consider using xCAT's post install directory sync facility and place them in opt/gm/routes.
     
  33. Install Torque and Maui on a user node.  Place torque-1.1.0p0.tar.gz and maui-3.2.6p9.tar.gz in /tmp:

    cd /tmp
    /opt/xcat/build/torque/torquemaker torque-1.1.0p0.tar.gz scp
    /opt/xcat/build/maui/mauimaker maui-3.2.6p9.tar.gz
    genpbs
    noderange
    . /etc/profile.d/pbs.sh
    showq
    (you should see all your nodes)
    pbstop
    (you should see all your nodes)
     
  34. Add cluster users on usermaster as defined in $XCATROOT/etc/site.tab, then push out to rest of cluster:

    addclusteruser
    Enter username: bob
    Enter group: users
    Enter UID (return for next): 501
    Enter absolute home directory root: /home
    Enter password (blank for random): B0vHw0bL
    cd /etc
    cp passwd passwd.CYA
    cp group group.CYA
    cp /etc
    prsync -craz passwd group
    noderange,-$(hostname -s):/etc

    NOTEprsyncing the passwd and group files may be unsafe.  Use pushuser as an alternative.

    E.g.:

    pushuser noderange bob
     
  35. Test Torque/Maui.  Login as a user and type:

    bob@head01:~> qsub -l nodes=2,walltime=1:00:00 -I
    qsub: waiting for job 0.head01.foobar.org to start
    qsub: job 0.head01.foobar.org ready

    ----------------------------------------
    Begin PBS Prologue Thu Dec 19 14:17:53 MST 2002
    Job ID: 0.head01.foobar.org
    Username: bob
    Group: users
    Nodes: node10 node9
    End PBS Prologue Thu Dec 19 14:17:54 MST 2002
    ----------------------------------------


    Note the Nodes: line.  Try to ssh from node to node and back to the user node that started qsub:

    bob@node10:~> ssh node9
    bob@node9:~> exit
    logout
    Connection to node9 closed.
    bob@node10:~> ssh head01
    bob@head01:~> exit
    logout
    Connection to head01 closed.
    bob@node10:~> exit
    logout

    qsub: job 0.head01.foobar.org completed

    Now try to ssh back to the nodes that were assigned, you should be denied:

    bob@head01:~> ssh node9
    14653: Connection close by 199.88.179.209
     
  36. Install compilers and libaries.  Read the xCAT HPC Benchmark HOWTO for details.
     
  37. Install MPICH-GM on user nodes for application development.  Read myrinet-HOWTO.
     
  38. Get a HPL benchmark results and submit to IBM and top500.org.  Read the xCAT HPC Benchmark HOWTO for details.
     
  39. Enjoy your cluster.  Do some work.

Support

http://xcat.org


Egan Ford
egan@us.ibm.com
January  2005