First, a bit of background:
NRPE: A part of the Nagios (pronounced nah-gee-ohs) system/network monitor that is run as a daemon on a remote host, and allows you to pull data from the host such as its current load, memory usage, and disk usage.
Daemontools: This is a package for starting and stopping various daemons (sendmail, Apache, etc.). The main advantage that Daemontools has over typical init.d scripts is that a seperate process runs the daemon, and restarts it when it exits. So if you have a daemon that crashes regularly, it won't stay down. It also allows you to log events from the deamons, has a simpler architecture for adding new daemons, and other things. But anyway...
I've been wanting to run NRPE on one of our servers at work. Problem is, we don't have root, which means we can't start up the NRPE daemon in /etc/init.d nor can we put it in inetd.conf. So, I needed a way that I could run a crontab to check and see if a service is running, and run it if it's not.
Enter svok, one of the Daemontools utilities. I could use it to check and see if there is a service running, and start the service if it's not. Run a script that does that in a crontab once per minute, and in the event of a reboot, there will be less than a minute of downtime of NRPE. Sounds simple, right? Nope.
Turns out that there is no way to run the NRPE daemon in the foreground. When you run it in deamon mode it forks into the background and the foreground copy terminates. Kinda makes it hard for Daemontools to run it from a script that waits for it to finish. Enter fghack, another utility which, as the name implies, is a bit of a hack. You run it with the daemon you want to run as an arugment. For example:
fghack then opens up a bunch of extra file descriptors (to a pipe) which somehow prevents the parent nrpe process from terminating. (something to do with the file descriptors still being open, I guess..)
Now I ran into another interesting problem, just tonight. Let's say I updated NRPE's configuration file, and wanted to restart the daemon. Sending a termination signal (such as SIGTERM, signal 15) to the script does not cause the nrpe daemon to terminate. The reason for that is because the forked program is at beginning of its own process group. (see for yourself with ps -axf) Worse yet, when running something with fghack, for whatever reason, the shell script running cannot trap signals! Or rather, signals can be trapped, but since the script is waiting (forever) on nrpe, the signal handler will not execute until the fghack/nrpe ends and the script resume execution. Needless to say, this makes it rather difficult to control the nrpe daemon with the svc utility.
So, I found another trick for that. Allow me to illustrate with some psuedo code:
What the above code does it first runs the fghack program in the background. It seems contrary to its name, but it still functions the same. Then, the script goes into a loop where it does nothing but sleep for 1 second at a time. Now what happens is that when a signal is received, it will block for a maximum of 1 second until the script processes it. Then, it will kill the nrpe daemon. This in turn will cause fghack to exit, and the script to end, at which point it will be restarted by supervise.
So in summary, I now have a daemon will be stated up automatically when the system restarts, and can be restarted via standard utilities. I'd say that's pretty good for a daemon that was never meant to be run like that. And it makes a nice end for a busy week. :-)
NRPE: A part of the Nagios (pronounced nah-gee-ohs) system/network monitor that is run as a daemon on a remote host, and allows you to pull data from the host such as its current load, memory usage, and disk usage.
Daemontools: This is a package for starting and stopping various daemons (sendmail, Apache, etc.). The main advantage that Daemontools has over typical init.d scripts is that a seperate process runs the daemon, and restarts it when it exits. So if you have a daemon that crashes regularly, it won't stay down. It also allows you to log events from the deamons, has a simpler architecture for adding new daemons, and other things. But anyway...
I've been wanting to run NRPE on one of our servers at work. Problem is, we don't have root, which means we can't start up the NRPE daemon in /etc/init.d nor can we put it in inetd.conf. So, I needed a way that I could run a crontab to check and see if a service is running, and run it if it's not.
Enter svok, one of the Daemontools utilities. I could use it to check and see if there is a service running, and start the service if it's not. Run a script that does that in a crontab once per minute, and in the event of a reboot, there will be less than a minute of downtime of NRPE. Sounds simple, right? Nope.
Turns out that there is no way to run the NRPE daemon in the foreground. When you run it in deamon mode it forks into the background and the foreground copy terminates. Kinda makes it hard for Daemontools to run it from a script that waits for it to finish. Enter fghack, another utility which, as the name implies, is a bit of a hack. You run it with the daemon you want to run as an arugment. For example:
fghack nrpe --daemon -c nrpe.cfg
fghack then opens up a bunch of extra file descriptors (to a pipe) which somehow prevents the parent nrpe process from terminating. (something to do with the file descriptors still being open, I guess..)
Now I ran into another interesting problem, just tonight. Let's say I updated NRPE's configuration file, and wanted to restart the daemon. Sending a termination signal (such as SIGTERM, signal 15) to the script does not cause the nrpe daemon to terminate. The reason for that is because the forked program is at beginning of its own process group. (see for yourself with ps -axf) Worse yet, when running something with fghack, for whatever reason, the shell script running cannot trap signals! Or rather, signals can be trapped, but since the script is waiting (forever) on nrpe, the signal handler will not execute until the fghack/nrpe ends and the script resume execution. Needless to say, this makes it rather difficult to control the nrpe daemon with the svc utility.
So, I found another trick for that. Allow me to illustrate with some psuedo code:
trap "killall nrpe; exit 0" 1 2 3 4 etc... fghack nrpe --daemon -c nrpe.cfg & while true do sleep 1 done
What the above code does it first runs the fghack program in the background. It seems contrary to its name, but it still functions the same. Then, the script goes into a loop where it does nothing but sleep for 1 second at a time. Now what happens is that when a signal is received, it will block for a maximum of 1 second until the script processes it. Then, it will kill the nrpe daemon. This in turn will cause fghack to exit, and the script to end, at which point it will be restarted by supervise.
So in summary, I now have a daemon will be stated up automatically when the system restarts, and can be restarted via standard utilities. I'd say that's pretty good for a daemon that was never meant to be run like that. And it makes a nice end for a busy week. :-)
File handle shennanigans.
Date: 2003-10-18 12:12 pm (UTC)Your restart script will then spawn another instance of the daemon, and so on.
Of course, you have a patch now, so it's moot point }:>.