Difference between revisions of "Monitoring Server"

From New IAC Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 +
=Monitored Systems and Services=
 +
 +
{| border="1"
 +
|+ Systems and Services Currently Monitored
 +
! System Name !! CPU Usage !! Current Load !! # Users !! DRBD !! Disk Space !! Heartbeat !! LDAP !! Memory !! # Network Connections !! Network I/O !! RAID !! SMART !! SSH !! # Processes !! HTTP !! MySQL !! Samba !! System Temperature !! PSU !! PING !! Other
 +
|-
 +
! Alan's Desktop
 +
| No || No || No || No || No || No || No || No || No || No || No || No || No || No || No || No || No || No || No || Yes || No
 +
|-
 +
! brems.iac.isu.edu
 +
| Yes || Yes || Yes || No || Yes || No || No || Yes || Yes || Yes || Yes || Yes || Yes || Yes || No || No || No || No || No || No || Averaging Test, Cluster Room Temperatures
 +
|}
 +
 
==Computers to Monitor==
 
==Computers to Monitor==
 
*Brems
 
*Brems

Revision as of 02:44, 25 May 2010

Monitored Systems and Services

Systems and Services Currently Monitored
System Name CPU Usage Current Load # Users DRBD Disk Space Heartbeat LDAP Memory # Network Connections Network I/O RAID SMART SSH # Processes HTTP MySQL Samba System Temperature PSU PING Other
Alan's Desktop No No No No No No No No No No No No No No No No No No No Yes No
brems.iac.isu.edu Yes Yes Yes No Yes No No Yes Yes Yes Yes Yes Yes Yes No No No No No No Averaging Test, Cluster Room Temperatures

Computers to Monitor

  • Brems
    • Slave nodes
    • Slurm queue
  • Inca
  • Webserver
  • Wiki
  • Seattle
  • Backup server
  • File server


Non-computer things to monitor

  • Cluster Room Temp
  • CleanRoom temp probes

Things to Monitor

  • raid status (number of up drives)
  • Hard drive space df
  • memory usage
  • load average
  • temp (CPU, case, etc) lmsensors1
  • CPU utilization
  • fan speed/failure
  • ITRC
    • number of connections (netstat?)
    • Network I/O
    • Individual process times