giza: Giza White Mage (Default)
[personal profile] giza
First, a bit of background:

NRPE: A part of the Nagios (pronounced nah-gee-ohs) system/network monitor that is run as a daemon on a remote host, and allows you to pull data from the host such as its current load, memory usage, and disk usage.

Daemontools: This is a package for starting and stopping various daemons (sendmail, Apache, etc.). The main advantage that Daemontools has over typical init.d scripts is that a seperate process runs the daemon, and restarts it when it exits. So if you have a daemon that crashes regularly, it won't stay down. It also allows you to log events from the deamons, has a simpler architecture for adding new daemons, and other things. But anyway...

I've been wanting to run NRPE on one of our servers at work. Problem is, we don't have root, which means we can't start up the NRPE daemon in /etc/init.d nor can we put it in inetd.conf. So, I needed a way that I could run a crontab to check and see if a service is running, and run it if it's not.

Enter svok, one of the Daemontools utilities. I could use it to check and see if there is a service running, and start the service if it's not. Run a script that does that in a crontab once per minute, and in the event of a reboot, there will be less than a minute of downtime of NRPE. Sounds simple, right? Nope.

Turns out that there is no way to run the NRPE daemon in the foreground. When you run it in deamon mode it forks into the background and the foreground copy terminates. Kinda makes it hard for Daemontools to run it from a script that waits for it to finish. Enter fghack, another utility which, as the name implies, is a bit of a hack. You run it with the daemon you want to run as an arugment. For example:
fghack nrpe --daemon -c nrpe.cfg

fghack then opens up a bunch of extra file descriptors (to a pipe) which somehow prevents the parent nrpe process from terminating. (something to do with the file descriptors still being open, I guess..)

Now I ran into another interesting problem, just tonight. Let's say I updated NRPE's configuration file, and wanted to restart the daemon. Sending a termination signal (such as SIGTERM, signal 15) to the script does not cause the nrpe daemon to terminate. The reason for that is because the forked program is at beginning of its own process group. (see for yourself with ps -axf) Worse yet, when running something with fghack, for whatever reason, the shell script running cannot trap signals! Or rather, signals can be trapped, but since the script is waiting (forever) on nrpe, the signal handler will not execute until the fghack/nrpe ends and the script resume execution. Needless to say, this makes it rather difficult to control the nrpe daemon with the svc utility.

So, I found another trick for that. Allow me to illustrate with some psuedo code:
trap "killall nrpe; exit 0" 1 2 3 4 etc...

fghack nrpe --daemon -c nrpe.cfg &

while true
do
   sleep 1
done

What the above code does it first runs the fghack program in the background. It seems contrary to its name, but it still functions the same. Then, the script goes into a loop where it does nothing but sleep for 1 second at a time. Now what happens is that when a signal is received, it will block for a maximum of 1 second until the script processes it. Then, it will kill the nrpe daemon. This in turn will cause fghack to exit, and the script to end, at which point it will be restarted by supervise.

So in summary, I now have a daemon will be stated up automatically when the system restarts, and can be restarted via standard utilities. I'd say that's pretty good for a daemon that was never meant to be run like that. And it makes a nice end for a busy week. :-)

(no subject)

Date: 2003-10-18 07:15 am (UTC)
From: [identity profile] balinares.livejournal.com
Wurk... Wouldn't it have been way easier to patch NRPE to give it a 'don't fork to background' option...?

Now of course, that would have been way less hackish and geeky, alright. That counts, too. :D

(no subject)

Date: 2003-10-18 08:38 am (UTC)
From: [identity profile] balinares.livejournal.com
There, done. Here's the patch. It allows you to start NRPE as a daemon with the option -f (or --foreground), which makes it a daemon that will remain in the foreground.


--- nrpe-2.0/src/nrpe.c.bak     2003-10-18 16:37:47.000000000 +0200
+++ nrpe-2.0/src/nrpe.c 2003-10-18 17:24:02.000000000 +0200
@@ -73,6 +73,7 @@
 int     show_version=FALSE;
 int     use_inetd=TRUE;
 int     debug=FALSE;
+int     stay_in_foreground=FALSE;

 #ifdef HAVE_SSL
 SSL_METHOD *meth;
@@ -124,9 +125,10 @@
                printf("\n");
                printf("Options:\n");
                printf(" <config_file> = Name of config file to use\n");
-               printf(" <mode>        = One of the following two operating modes:\n");
+               printf(" <mode>        = One of the following three operating modes:\n");
                printf("   -i          =    Run as a service under inetd or xinetd\n");
                printf("   -d          =    Run as a standalone daemon\n");
+               printf("   -f          =    Run as a foreground process\n");
                printf("\n");
                printf("Notes:\n");
                printf("This program is designed to process requests from the check_nrpe\n");
@@ -217,6 +219,19 @@
                handle_connection(0);
                }

+       /* if we're running in the foreground... */
+       else if (stay_in_foreground==TRUE){
+
+               /* drop privileges */
+               drop_privileges(nrpe_user,nrpe_group);
+
+               /* make sure we're not root */
+               check_privileges();
+
+               /* wait for connections */
+               wait_for_connections();
+               }
+
        /* else daemonize and start listening for requests... */
        else if(fork()==0){

@@ -1446,6 +1461,7 @@
                {"no-ssl", no_argument, 0, 'n'},
                {"help", no_argument, 0, 'h'},
                {"license", no_argument, 0, 'l'},
+               {"foreground", no_argument, 0, 'f'},
                {0, 0, 0, 0}
                 };
 #endif
@@ -1454,7 +1470,7 @@
        if(argc<2)
                return ERROR;

-       snprintf(optchars,MAX_INPUT_BUFFER,"c:nidhl");
+       snprintf(optchars,MAX_INPUT_BUFFER,"c:nidhlf");

        while(1){
 #ifdef HAVE_GETOPT_H
@@ -1493,6 +1509,11 @@
                case 'n':
                        use_ssl=FALSE;
                        break;
+               case 'f':
+                       use_inetd=FALSE;
+                       stay_in_foreground=TRUE;
+                       have_mode=TRUE;
+                       break;
                default:
                        return ERROR;
                        break;

File handle shennanigans.

Date: 2003-10-18 12:12 pm (UTC)
From: [identity profile] cjthomas.livejournal.com
Beware of file handles that time out after a day or so of hanging open. I ran into something like that on an AFS share on a Solaris system that interfered with long simulation runs, and if something like that exists in your environment, it'll cause your fghack call to mysteriously terminate at some unspecified future time.

Your restart script will then spawn another instance of the daemon, and so on.

Of course, you have a patch now, so it's moot point }:>.

Profile

giza: Giza White Mage (Default)
Douglas Muth

April 2012

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags