[OPEN-ILS-DEV] Update to router

Sat Dec 19 17:04:09 EST 2009

I just added a little bulletproofing to the router code.  Before I
describe the change, I must explain a bit about how the router works.

1. In osrf_ctl.sh, we invoke the router as the executable opensrf_router.

2. The router spawns two child processes,

3. Each of the child processes spawns a grandchild and then immediately
exits.

4. Each grandchild turns itself into a daemon and hangs around to route
things.

In the old code, the parent process would exit immediately after
spawning its children.  The osrf_ctl.sh script runs a ps to capture the
process IDs of the running routers.  However, when the parent exits,
the grandchildren might not be running yet.  As a result, the script
inserts a sleep between opensrf_router and ps, so that the the
grandchildren have time to get spawned before ps goes looking for them.

That sleep is no longer necessary.

Now the parent router process waits for all of its immediate children
to terminate before exiting.  (It does *not* wait for the grandchildren
to terminate; that would be a long wait.)  As a result, the
grandchildren should be running by the time the parent exits.

If a child process terminates abnormally -- i.e. it exits with a
non-zero condition code, or it is terminated by a signal -- the
parent issues a warning message to that effect.

That message, if issued, goes to standard error, not to a log file.
The reason is that each child process opens its own separate log file,
as defined in the configuration file.  The parent has no log file
defined for it, and never opens one.

If you run osrf_ctl.sh from the command line, these messages, if
issued, will appear immediately after the "Starting OpenSRF Router"
message issued by the shell script to standard output.  If you run
osrf_ctl.sh from another layer of scripting, you may want to redirect
standard error so as to capture these message if they occur.

Scott McKellar