Linux watchdog daemon overview Linux Watchdog Daemon - Overview • • • • • Introduction A watchdog in computer terms is something, usually hardware-based, that monitors a complex system for “normal” behaviour and if it fails, performs a system reset to hopefully recover normal operation. You can read more on this at the.
It is intended as a for maintaining a system's availability and, at the very least, to ensure that the administrator can remotely log-in to diagnose and fix faults of a non-persistent manner. Obviously it won't stop a hardware fault from breaking a system, nor is it any good against a persistent software problem, but for a system that is generally well behaved (and particularly if it is located at a remote site and/or is otherwise essential for operations) it serves to improve the overall availability of the system. If your application cannot tolerate a short outage, then a watchdog alone is not going to solve it, you need to look at other high-availability solutions for hardware (e.g. RAID for disk error protection) and software (clustering & application mirroring) that will provide an acceptable degree of overall system availability. With the Linux operating system there are two parts to the watchdog: • The actual hardware timer and kernel that can force a hard reset, and; • The user-space that refreshes the timer and provides a wider range of health monitoring and recovery options. Both can function independently, but clearly they are designed to operate together for maximum protection.