Monitoring Server
Jump to navigation
Jump to search
Computers to Monitor
- Brems
- Slave nodes
- Slurm queue
- Inca
- Webserver
- Wiki
- Seattle
- Backup server
- File server
Non-computer things to monitor
- Cluster Room Temp
- CleanRoom temp probes
Things to Monitor
- raid status (number of up drives)
- Hard drive space df
- memory usage
- load average
- temp (CPU, case, etc) lmsensors1
- CPU utilization
- fan speed/failure
- ITRC
- number of connections (netstat?)
- Network I/O
- Individual process times