Monitoring Server

From New IAC Wiki
Revision as of 02:44, 25 May 2010 by Buckminst (talk | contribs)
Jump to navigation Jump to search

Monitored Systems and Services

Systems and Services Currently Monitored
System Name CPU Usage Current Load # Users DRBD Disk Space Heartbeat LDAP Memory # Network Connections Network I/O RAID SMART SSH # Processes HTTP MySQL Samba System Temperature PSU PING Other
Alan's Desktop No No No No No No No No No No No No No No No No No No No Yes No
brems.iac.isu.edu Yes Yes Yes No Yes No No Yes Yes Yes Yes Yes Yes Yes No No No No No No Averaging Test, Cluster Room Temperatures

Computers to Monitor

  • Brems
    • Slave nodes
    • Slurm queue
  • Inca
  • Webserver
  • Wiki
  • Seattle
  • Backup server
  • File server


Non-computer things to monitor

  • Cluster Room Temp
  • CleanRoom temp probes

Things to Monitor

  • raid status (number of up drives)
  • Hard drive space df
  • memory usage
  • load average
  • temp (CPU, case, etc) lmsensors1
  • CPU utilization
  • fan speed/failure
  • ITRC
    • number of connections (netstat?)
    • Network I/O
    • Individual process times