Difference between revisions of "Monitoring Server"

From New IAC Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
==Computers to Monitor==
 
==Computers to Monitor==
 
*Brems
 
*Brems
 +
**Slave nodes
 +
**Slurm queue
 
*Inca
 
*Inca
 
*Webserver
 
*Webserver

Revision as of 00:16, 30 December 2009

Computers to Monitor

  • Brems
    • Slave nodes
    • Slurm queue
  • Inca
  • Webserver
  • Wiki
  • Seattle
  • Backup server
  • File server


Non-computer things to monitor

  • Cluster Room Temp
  • CleanRoom temp probes

Things to Monitor

  • raid status (number of up drives)
  • Hard drive space df
  • memory usage
  • load average
  • temp (CPU, case, etc) lmsensors1
  • CPU utilization
  • fan speed/failure
  • ITRC
    • number of connections (netstat?)
    • Network I/O
    • Individual process times