Cluster Administration using HPE SGI Management Suite H6LM8S (131443)

The HPE SGI Management Suite cluster administration course provides knowledge and practice in basic cluster administration areas such as cluster software installation ,cluster configuration, administration commands, software repository and image management, provisioning, application installation, monitoring with Ganglia and Nagios, and troubleshooting the cluster


Audience

  • Attend this course if you administer HPE SGI Management Suite on HPE SGI 8600 clusters or SGI Management Center 3 on SGI ICE clusters.
  • Experienced Linux System Administrators
  • Experienced Linux users who must maintain their own systems


Prerequisites

The following Linux system administration skills are prerequisites for this course:

  • Editing text with the vi editor
  • Recognizing regular expression syntax
  • Accessing documentation with man and info file viewers
  • Monitoring, managing and maintaining log files
  • Entering common commands at the bash command line; creating and interpreting basic bash shell scripts
  • Installing and configuring standard software components, services, and security feature
  • Configuring basic communication protocols that support networked communications
  • Creating and modifying crontabs
  • Monitoring resources usage, familiarity with basic monitoring tools
  • Installing and configuring a Linux distribution on a server
  • Creating, modifying, and deleting user accounts and group accounts
  • Partitioning disks, managing filesystems and logical volumes
  • Using RPM package management
  • Installing and using virtualized systems
  • Understanding basic hardware and hardware troubleshooting


Course objectives

Upon completion of this course, the student will be able to:

  • Use the ipmitool command to setup for cluster admin node imaging
  • Setup Serial Over Lan for console access and power control
  • Troubleshoot startup problems
  • Configure a cluster using the SGI Management Center 3 (SMC3)
  • Image compute nodes
  • Run InfiniBand commands
  • Setup user accounts
  • Run MPI applications across the cluster
  • Monitor a running cluster with Ganglia and Nagios
  • Add and remove compute nodes
  • Install and setup a batch scheduler
  • Submit batch jobs with a batch schedule


Benefits to you

  • Learn the best practices for configuring, using, monitoring, and troubleshooting your cluster
  • Learn how to customize your cluster to meet your needs
  • Develop a thorough understanding of all commands available to administer your cluster
  • Gain practical experience on our lab systems and avoid learning through trial and error on your production systems
Details anzeigen


Course Modules

Module 1: Overview

  • Identify flat and hierarchical cluster topologies
  • Explain the function of admin, rack leader, compute (service), and ice-compute node roles
  • Describe the network VLAN layout
  • Recognize the interface naming conventions


Module 2: Installation

  • Install the admin node
  • Install HPE SGI Management Suite software
  • Copy distribution and HPE Performance Software – Message Passing Interface RPMs to the repository on the admin node
  • Specify the cluster domain name
  • Add patches or updates
  • Setup network time protocol (NTP)
  • Build database and the rack lead, compute (service), and ice-compute images


Module 3: Discovery

  • Use the discover command to add lead and compute node to the cluster database
  • Use the discover command to image the lead and compute nodes
  • Use the discover command to monitor the automated addition of ice-compute nodes to the cluster
  • Review the structure of the discover configfile
  • Reset the cluster database


Module 4: Data Networks

  • List data network interconnects
  • Identify key InfiniBand (IB) features
  • Identify IB fabric components and functions
  • Configure basic OpenSM software
  • Run basic IB diagnostics


Module 5: Monitoring

  • Use the Ganglia web interface to monitor the cluster
  • Monitor the cluster with common utilities


Module 6: Customize the Cluster

  • Maintain repository and rebuild images with custom RPM lists
  • Configure cluster services
  • Use cimage to manage ice-compute node images
  • Use cinstallman to manage node images


Module 7: Cluster User Environment

  • Use the pdsh commands
  • Use the module command
  • Compile and run test programs using the MPI environment


Module 8: Post-install Scripts

  • Review the post-installation scripts feature for compute and lead nodes
  • Review the per-host customization scripts feature for icecompute
  • Use post-install scripts to append to a file and to create a file on compute nodes


Module 9: Maintenance

  • Identify if a node has failed
  • Get failure information
  • Disable the node
  • Re-enable the node
  • Review cadmin options
  • Monitor BMC/CMC/ECC environmental events
  • Update the cluster


Module 10:Troubleshooting

  • Use system_info_gather and dbdump for system inventory
  • Review cluster log files
  • Obtain a traceback with nodetrace
  • Review lead node XFS project quotas