vi BAS5 for Xeon - Maintenance Guide
3.1.2
Fabric Diagnostics.................................................................................................. 3-2
3.1.3 Debugging Tools.................................................................................................... 3-2
3.1.4 High-Level Diagnostic Tools ..................................................................................... 3-2
3.1.5 CLI Diagnostic Tools ............................................................................................... 3-3
3.1.6 Event Notification Mechanism.................................................................................. 3-4
3.2 Troubleshooting InfiniBand Stacks...................................................................................... 3-5
3.2.1 smpquery .............................................................................................................. 3-5
3.2.2 perfquery .............................................................................................................. 3-7
3.2.3 ibnetdiscover and ibchecknet................................................................................... 3-9
3.2.4 ibcheckwidth and ibcheckportwidth........................................................................ 3-10
3.2.5 More Information ................................................................................................. 3-10
3.3 Node Deployment Troubleshooting.................................................................................. 3-11
3.3.1 ksis deployment accounting ................................................................................... 3-11
3.3.2 Possible Deployment Problems ............................................................................... 3-11
3.4 Storage Troubleshooting................................................................................................. 3-13
3.4.1 Management Tools Troubleshooting ....................................................................... 3-13
3.5 Lustre Troubleshooting.................................................................................................... 3-16
3.5.1 Hung Nodes........................................................................................................ 3-16
3.5.2 Suspected File System Bug .................................................................................... 3-16
3.5.3 Cannot re-install a Lustre File System if the status is CRITICAL..................................... 3-16
3.5.4 Cannot create file from client................................................................................. 3-17
3.5.5 No such device.................................................................................................... 3-17
3.6 Lustre File System High Availability Troubleshooting .......................................................... 3-18
3.6.1 On the Management Node ................................................................................... 3-18
3.6.2 On the Nodes of an I/O Pair ................................................................................ 3-20
3.7 SLURM Troubleshooting.................................................................................................. 3-23
3.7.1 SLURM does not start............................................................................................ 3-23
3.7.2 SLURM is not responding ...................................................................................... 3-23
3.7.3 Jobs are not getting scheduled............................................................................... 3-24
3.7.4 Nodes are getting set to a DOWN state................................................................. 3-24
3.7.5 Networking and Configuration Problems................................................................. 3-25
3.7.6 More Information ................................................................................................. 3-26
3.8 FLEXlm License Manager Troubleshooting......................................................................... 3-27
3.8.1 Entering License File Data...................................................................................... 3-27
3.8.2 Using the lmdiag utility ......................................................................................... 3-27
3.8.3 Using INTEL_LMD_DEBUG Environment Variable ..................................................... 3-27
Chapter 4. Updating the firmware for the InfiniBand switches................................4-1
4.1 Checking which Firmware Version is running...................................................................... 4-1
4.2 Configuring FTP for the firmware upgrade.......................................................................... 4-2
4.2.1 Installing the FTP Server .......................................................................................... 4-2
4.2.2 Configuring the FTP server options for the InfiniBand switch ........................................ 4-3
4.3 Upgrading the firmware ................................................................................................... 4-4
Chapter 5. Updating the firmware for the MegaRAID card....................................5-1