HPE Performance Cluster Management Administration (H8PE9S)

The HPE Performance Cluster Manager (HPCM) administration course provides knowledge and practice installing HPCM, managing data networks, provisioning servers, creating and modifying server images, working with software repositories and image version control, automating post installation tasks, configuring services, reviewing security features, and troubleshooting. 


Audience

  • Attend this class if you need to learn to install, configure and administer clusters managed with the HPE Performance Cluster Manager (HPCM)
  • Experienced Linux system administrators


Prerequisites

  • H8PE8S: HPE Performance Cluster Management Foundations
  • The following Linux system administration skills are prerequisites for this course:
    • Edit text with the vi editor
    • Recognize regular expression syntax
    • Access documentation with man and info file viewers
    • Monitor, manage and maintain log files
    • Enter common commands at the bash command line; create and interpret basic bash shell scripts
    • Install and configure standard software components, services and security features
    • Configure basic communication protocols that support networked communications
    • Create and modify crontabs
    • Monitor resources usage; be familiar with basic monitoring tools
    • Install and configure a Linux distribution on a server
    • Create, modify, and delete user accounts and group accounts
    • Partition disks, manage filesystems and logical volumes
    • Use RPM package management
    • Install and use virtualized systems
    • Understand basic hardware and hardware troubleshooting


Course objectives

At the conclusion of this course, you should be able to:

  • Install HPCM
  • Add servers to the cluster
  • Manage data networks
  • Provision nodes
  • Create and modify images and software repositories
  • Use image version control
  • Automate post installation tasks
  • Configure shared filesystem, user accounts, applications and updates
  • Troubleshoot cluster services
  • Review cluster security features
Details anzeigen


Detailed course outline

Module 1: Install Cluster

  • Describe HPCM features
  • Define operating system slots
  • Build cluster from ground up
  • Provision node with GUI
  • Provision node with command line
  • Add nodes to the cluster
  • Explore auto installation tools


Module 2: Discover

  • Discover nodes
  • Interpret cluster configuration files
  • Review cluster services


Module 3: Data Networks

  • Describe technologies
  • Describe InfiniBand configuration
  • Describe Intel Omni-Path configuration
  • Describe software components
  • Use diagnostic commands


Module 4: Manage Images

  • Manage software repositories
  • List software repositories
  • Add software repositories
  • Remove software repositories
  • Create repository groups
  • Customize an image by using RPM lists
  • Create a compute node image
  • Create an ICE-compute node image
  • Manage image version control
  • Check in an image into version control
  • Compare differences between two versions of an image
  • List the versions of an image
  • Deploy a specific version of an image
  • Push an ICE-compute image to a rack
  • Use parallel tools and inbuilt functionality to check differences between nodes
  • Enable hyperthreading
  • Disable hyperthreading
  • Configure array services
  • Install batch scheduler server on a compute node
  • Install batch scheduler client on a compute node and in ICE compute node
  • Configure HPCM connectors to job schedulers
  • Capture an image from a node (golden)
  • Add RPMs to, remove RPMs from, and version control compute images
  • Add and remove RPMs from running compute nodes
  • Clone an ICE-compute image
  • Clean up old images on the lead node
  • Add RPMs to ICE compute image Compare when and when not to use tmpfs root
  • Determine which nodes use tmpfs root
  • Configure nodes to use tmpfs root
  • List tmpfs quota difference (rack leader quotas do not apply when ICE-compute nodes are in tmpfs)
  • Set tmpfs mode
  • Set disk mode
  • Show which mode a node has booted with
  • Show which mode a node is scheduled to boot into
  • Perform a clone operating system slot operation


Module 5: Automate Post Installation Tasks

  • Review conf.d scripts
  • Exclude a conf.d script
  • Use pre_reconf.sh
  • Use reconfig.sh
  • Develop post install and per-host customization scripts


Module 6: Configure Shared Filesystem, User Accounts, Applications, and Updates

  • NFS Export a filesystem on a compute node
  • Mount an NFS filesystem and create a user on an ICE compute node
  • Manage user accounts
  • Synchronize UIDs and GIDs, LDAP, etc.
  • Run an application on compute and ICE compute nodes
  • Display BIOS settings
  • Upgrade firmware
  • Update kernel
  • Update distribution
  • Update HPCM


Module 7: Troubleshoot Cluster

  • Backup cluster configuration
  • Backup managed network switch configuration
  • Use the central log repository
  • Investigate log files
  • Gather system information
  • Interrogate iLOs, BMCs
  • Confirm resources
  • Create pdsh groups
  • Investigate bond devices
  • Inspect VLAN devices
  • Capture a node crash dump
  • Transfer an image from another slot or another system and confirm that the image can be used.
  • Inject faults


Module 8: Review Cluster Security

  • Describe system administrator configurable security tasks
  • • Describe what makes cluster security different from standalone security (how would change X break the cluster)
  • List ports used for each node role and for which interfaces
  • List components with passwords
    • Admin node
    • Flat compute nodes
    • Rack leader nodes
    • ICE compute nodes
    • BMCs
    • CMCs
    • Ethernet network switches
    • InfiniBand and Omni-Path switches
    • IB/OPA switch BMCs
    • Storage controllers
  • List components that can have passwords applied