Difference between revisions of "Monitoring Server"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
==Computers to Monitor== | ==Computers to Monitor== | ||
*Brems | *Brems | ||
+ | **Slave nodes | ||
+ | **Slurm queue | ||
*Inca | *Inca | ||
*Webserver | *Webserver |
Revision as of 00:16, 30 December 2009
Computers to Monitor
- Brems
- Slave nodes
- Slurm queue
- Inca
- Webserver
- Wiki
- Seattle
- Backup server
- File server
Non-computer things to monitor
- Cluster Room Temp
- CleanRoom temp probes
Things to Monitor
- raid status (number of up drives)
- Hard drive space df
- memory usage
- load average
- temp (CPU, case, etc) lmsensors1
- CPU utilization
- fan speed/failure
- ITRC
- number of connections (netstat?)
- Network I/O
- Individual process times