[OPEN-ILS-DEV] C nits: osrf_system.c

Mike Rylander mrylander at gmail.com
Mon May 28 23:03:08 EDT 2007


On 5/18/07, Bill Erickson <billserickson at gmail.com> wrote:
>
>
> On 5/12/07, Scott McKellar <mck9 at swbell.net> wrote:
> > in ILS/OpenSRF/src/libstack/osrf_system.c, we install a
> signal handler
> > __osrfSystemSignalHandler() to catch SIGCHLD signals from the death of
> > child processes.
> >
> > The handler calls waitpid() in a small loop to release any defunct
> > child processes from zombie status, and issues a warning message for
> > each one.
> >
> > What bothers me is that the signal handler does not reinstall itself
> > before returning.  Under standard ANSI signal handling: when the
> > signal is received, it restores the default handling for that signal
> > (which in the case of SIGCHLD is to ignore the signal).  The usual
> > practice is for the signal handler to do whatever it does and then
> > call signal() to reinstall itself.
> >
> > However we don't do that.  We return to a sleep loop, and only after
> > 10000 seconds do we reinstall the signal handler.
>
>
> My understanding of how that works is something like this:
>
> *top of main loop*
> - register signal handler
> - sleep
> - SIGCHLD interrupts sleep
> - signal handler reaps all dead children with  wait() loop
> *back to top of main loop*
> - register signal handler
> - sleep
> -...
>
> > Suppose that one child process dies, and then a second child dies a
> > minute later.  We catch the first SIGCHLD signal and issue a message
> > for the first child.  However by the time second child dies we don't
> > have a signal handler any more, so we miss that signal.
> >
> > Eventually, if a third child process dies after 10000 more seconds,
> > we will catch it because we will have reinstalled the signal handler
> > at the top of the sleep loop.  At that point, the waitpid() loop
> > will notice the deaths of both the second and the third children.
> > However the second child will have been a zombie in the meanwhile
> > for almost three hours.
> >
> > If all the child processes die within a few minutes, we'll only catch
> > the first one.  In general we'll notice all the child deaths only if
> > the last one dies more than 10000 seconds after the one before.
> >
> > The simplest fix is to add the following line to the end of the
> > signal handler:
> >
> >     signal(SIGCHLD, __osrfSystemSignalHandler);
> >
> > In addition we can remove the other call to signal() from the inside
> > of the sleep loop:
> >
> >         daemonize();
> >         signal(SIGCHLD, __osrfSystemSignalHandler);
> >         while(1) {
> >                 sleep(10000);
> >         }
> >
> > Actually I don't think we need this signal handler at all, nor any
> > call to sleep().  We can just call wait() in a loop (warning:
> > uncompiled and untested):
> >
> >         daemonize();
> >         while(1) {
> >             int status;
> >             pid_t pid;
> >
> >             pid = wait( &status );
> >             osrfLogWarning( OSRF_LOG_MARK, "We lost child %d", pid);
> >         }
> >
> > This approach avoids the problem of calling non-reentrant functions
> > from within a signal handler.  It also plugs another hole: the
> > possibility that a child process will die after the fork() but before
> > the signal().
> >
> > Some version of these changes will enable us to respond to the death
> > of child processes promptly and reliably, instead of haphazardly.
> >
> > I could submit a patch, but I'm not in a position to test it very
> > readily.  In any case it has been a while since I've written this
> > kind of code, and I'm more than a little rusty.
>
> Yep, I agree this is the superior approach.  We should be able to work the
> code in (from your next email, actually) without a patch.

It is now worked in, and includes a call to strerror() (string.h is
included by utils.h).  Seems to work fine -- thanks again Scott.

--miker

>
> Thanks, Scott!
>
>
>
> --
> Bill Erickson
> PINES Systems Developer
> Georgia Public Library Service
> billserickson at gmail.com
> http://open-ils.org


-- 
Mike Rylander


More information about the Open-ils-dev mailing list