[OPEN-ILS-DEV] Debugging OpenSRF installation

Scott McKellar mck9 at swbell.net
Mon Jun 22 01:44:55 EDT 2009


Dan:

Your intuition was correct.

I'm oversimplifying a bit here, but in essence when there is more
than one <service> element (the usual case), they are loaded into
an array.  But when there's only one, it's stored as a string.
Not an array of one string, but just -- a string.

The code that looks at this list was expecting to see an array.
When it saw a string, it misbehaved.

I committed a patch to osrf_prefork.c that appears to fix this
problem, so far as I can tell from my testing.  If you want to test
it yourself, that should be the only file you need to update, if the
rest of your OSRF installation is reasonably up to date.

Thanks for your investigation.  Thanks also to Victoria for finding
and reporting the bug in the first place.

Scott McKellar

--- On Thu, 6/18/09, Dan Wells <dbw2 at calvin.edu> wrote:

> From: Dan Wells <dbw2 at calvin.edu>
> Subject: Re: [OPEN-ILS-DEV] Debugging OpenSRF installation
> To: "Evergreen Development Discussion List" <open-ils-dev at list.georgialibraries.org>
> Date: Thursday, June 18, 2009, 10:26 AM
> Hello all,
> 
> Well, I went back to square one and was able to reproduce
> this buggy behavior, so Victoria is not alone!
> 
> Furthermore, I think I have a lead for the developers to
> follow in fixing this.  It seems that the following
> default config in opensrf_core.xml is not being parsed as
> valid:
> 
> <opensrf>
>     <routers>
>       <router>
>        
> <name>router</name>
>        
> <domain>public.localhost</domain>
>         <services>
>            
> <service>opensrf.math</service>
>         </services>
> ...
> 
> Since the math service was working fine on my full install,
> I noticed that the only difference in this section was that
> many more services were attached to the public.localhost
> router.  As a basic test, I added another fake service
> line, as follows:
> 
> ...
>         <services>
>            
> <service>opensrf.math</service>
>            
> <service>opensrf.blah</service>
>         </services>
> ...
> 
> Bingo!  opensrf.math now tested fine when logging into
> public.localhost.  I also tested a case of simply
> doubling the opensrf.math line:
> 
> ...
>         <services>
>            
> <service>opensrf.math</service>
>            
> <service>opensrf.math</service>
>         </services>
> ...
> 
> and that worked as well.
> 
> So, it seems that there is a bug in parsing
> opensrf_core.xml when only a single service is listed for a
> router.  That service does not get attached
> properly.  Adding another service allows the first
> service to work, but it is unknown if the second service is
> affected (that is, it may be the case that the last service
> listed is failing, though I somehow doubt that).
> 
> Victoria, try doubling the service line as I have, restart
> the stack, and see if the public math interface works for
> you.
> 
> Good luck,
> DW
> 
> 
> >>> Victoria Bush <vbush at ilstu.edu>
> 6/16/2009 10:59 AM >>>
> 
> On Jun 15, 2009, at 5:06 PM, Dan Wells wrote:
> 
> > Hello Victoria,
> >
> > I kinda hate to suggest this, since I thought this
> issue was fixed,  
> > but have you tried starting the components
> separately?  That is:
> >
> > osrf_ctl.sh -l -a stop_all
> >
> > (wait a few minutes, kill as needed)
> >
> > osrf_ctl.sh -l -a start_router
> >
> > (wait for activity to stop/slow)
> >
> > osrf_ctl.sh -l -a start_perl
> >
> > (wait for activity to stop/slow)
> >
> > osrf_ctl.sh -l -a start_c
> >
> > Since many pieces in this system operate
> independently, starting  
> > them in a more controlled fashion has been suggested
> in the past for  
> > quirky race-condition-type problems.
> >
> > Good luck,
> > DW
> >
> >
> 
> Okay, several things are causing confusion on my part
> because the  
> documented behavior at
> http://evergreen-ils.org/dokuwiki/doku.php?id=troubleshooting:checking_for_errors
> 
> 
> is not what I'm seeing. First of all, the default
> opensrf_core.xml  
> example file that I used to create my file did not create
> separate log  
> files, private.router.log and public.router.log. (Of
> course, on the  
> troubleshooting page two paragraphs above this mention of
> two router  
> log files, it mentions the single router.log that is
> instead created.)  
> So I changed my xml file to create two separate log files.
> 
> In addition, the opensrf_core.xml example file claims that
> a log file  
> gateway.log will be created, but I've never seen one. What
> seems to be  
> happening is that the private.localhost router comes up
> fine, but the  
> public.localhost one doesn't--or if it does come up, it's
> in some  
> weird state that doesn't do anything.
> 
> Retracing my steps:
> 
> 1. I stopped everything and killed all leftover processes.
> I also  
> moved all the log files into a subdirectory to hide them
> for now.
> 2. I *only* started the router:
>     osrf_ctl.sh -l -a start_router
> 3. The *only* log file that is created is
> private.router.log:
> 
> > router 2009-06-16 09:25:26
> [INFO:30364:osrf_router_main.c:95:]  
> > Router connecting as: server: private.localhost port:
> 5222 user:  
> > router resource: router
> > router 2009-06-16 09:25:26
> [INFO:30364:osrf_router_main.c:117:]  
> > Router adding trusted server: private.localhost
> > router 2009-06-16 09:25:26
> [INFO:30364:osrf_router_main.c:129:]  
> > Router adding trusted client: private.localhost
> >
> 
> I see no file called public.router.log, but there are two
> processes  
> running:
> 
> $ ps -eaf | grep OpenSRF
> opensrf  30368     1  0 09:25
> ?        00:00:00 OpenSRF Router
> opensrf  30369     1  0 09:25
> ?        00:00:00 OpenSRF Router
> opensrf  30385 29763  0 09:28 pts/1   
> 00:00:00 grep OpenSRF
> 
> 4. If I stop the router now:
>     osrf.ctl.sh -l -a stop_router
> 
> NOW I see a public.router.log file, and it says:
> 
> > router 2009-06-16 09:51:55
> [WARN:30368:osrf_router_main.c:11:]  
> > Received signal [2], cleaning up...
> >
> 
> So while the public router comes up, something's not right.
> But I have  
> no idea how to diagnose this further.
> 
> The only changes in my opensrf_core.xml file since I last
> posted it  
> was to change the names of the log files, as indicated
> above. So the  
> differences between this core file and the example one
> included in  
> OpenSRF 1.0.6 are just the passwords and the log files.
> 
> > $ diff opensrf_core.xml.example opensrf_core.xml
> > 38c38
> > < 
>    <passwd>password</passwd>
> > ---
> > > 
>    <passwd>*****</passwd>
> > 104c104
> > < 
>    <passwd>password</passwd>
> > ---
> > > 
>    <passwd>*****</passwd>
> > 128c128
> > <             
>    <password>password</password>
> > ---
> > >             
>    <password>*****</password>
> > 133c133
> > <         
>    <logfile>/openils/var/log/router.log</logfile>
> > ---
> > >         
>    <logfile>/openils/var/log/public.router.log</logfile>
> > 150c150
> > <             
>    <password>password</password>
> > ---
> > >             
>    <password>*****</password>
> > 155c155
> > <         
>    <logfile>/openils/var/log/router.log</logfile>
> > ---
> > >         
>    <logfile>/openils/var/log/private.router.log</logfile>
> 
> 
> Here's my slightly updated opensrf_core.xml file.
> 
> 
> > <?xml version="1.0"?>
> > <!--
> > vim:et:ts=2:sw=2:
> > -->
> > <config>
> >
> >   <!-- bootstrap config for OpenSRF
> apps -->
> >   <opensrf>
> >
> >     <routers>
> >
> >       <!-- define the list
> of routers our services will register  
> > with -->
> >
> >       <router>
> >
> >         <!-- This is
> the public router.  On this router, we only  
> > register applications
> >              which
> should be accessible to everyone on the opensrf  
> > network -->
> >     
>    <name>router</name>
> >     
>    <domain>public.localhost</domain>
> >     
>    <services>
> >         
>    <service>opensrf.math</service>
> >     
>    </services>
> >       </router>
> >
> >       <router>
> >         <!-- This is
> the private router.  All applications must  
> > register with
> >         
>    this router, so no explicit
> <services> section is  
> > required -->
> >     
>    <name>router</name>
> >     
>    <domain>private.localhost</domain>
> >       </router>
> >     </routers>
> >
> >
> >     <!-- Jabber login settings
> >         Our domain
> should match that of the private router -->
>>    <domain>private.localhost</domain>
>>    <username>opensrf</username>
>>    <passwd>privctltsrf</passwd>
> >     <port>5222</port>
> >     <!-- name of the router
> used on our private domain.
> >         this should
> match one of the <name> of the private router  
> > above -->
>>    <router_name>router</router_name>
> >
> >     <!-- log file settings
> ======================================  -->
> >     <!-- log to a local file
> -->
>>    <logfile>/openils/var/log/osrfsys.log</logfile>
> >
> >     <!-- Log to syslog. You can
> use this same layout for
> >         defining the
> logging of all services in this file -->
> >     <!--
>>    <logfile>syslog</logfile>
>>    <syslog>local2</syslog>
>>    <actlog>local1</actlog>
> >     -->
> >
> >     <!-- 0 None, 1 Error, 2
> Warning, 3 Info, 4 debug, 5 Internal  
> > (Nasty) -->
>>    <loglevel>3</loglevel>
> >
> >     <!-- config file for the
> services -->
>>    <settings_config>/openils/conf/opensrf.xml</settings_config>
> >
> >   </opensrf>
> >
> >   <!-- Update this if you use
> ChopChop -->
> >   <chopchop>
> >     <!-- Our jabber server
> -->
>>    <domain>private.localhost</domain>
> >     <port>5222</port>
> >     <!-- used when multiple
> servers need to communicate -->
>>    <s2sport>5269</s2sport>
>>    <secret>secret</secret>
>>    <listen_address>10.0.0.3</listen_address>
>>    <loglevel>3</loglevel>
>>    <logfile>/openils/var/log/osrfsys.log</logfile>
> >   </chopchop>
> >
> >   <!-- The section between
> <gateway>...</gateway> is a standard  
> > OpenSRF C stack config file -->
> >   <gateway>
> >
> >     <!--
> >     we consider ourselves to be
> the "originating" client for requests,
> >     which means we define the log
> XID string for log traces
> >     -->
>>    <client>true</client>
> >
> >     <!--  the routers's
> name on the network -->
>>    <router_name>router</router_name>
> >
> >     <!--
> >     These are the services that
> the gateway will serve.
> >     Any other requests will
> receive an HTTP_NOT_FOUND (404)
> >     DO NOT put any services here
> that you don't want the internet to  
> > have access to
> >     This section will be soon
> deprecated for multi-domain mode...
> >     -->
> >     <services>
> >   
>    <service>opensrf.math</service>
> >     </services>
> >
> >     <!-- jabber login info
> -->
> >
> >     <!-- The gateway connects
> to the public domain -->
>>    <domain>public.localhost</domain>
>>    <username>opensrf</username>
>>    <passwd>pubctltsrf</passwd>
> >     <port>5222</port>
>>    <logfile>/openils/var/log/gateway.log</logfile>
>>    <loglevel>3</loglevel>
> >
> >   </gateway>
> >
> >   <!--  
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> >
> ==================================================================== 
> 
> > -->
> >
> >     <routers>
> >         <router>
> <!-- public router -->
> >         
>    <trusted_domains>
> >             
>    <!-- allow private services to register
> with this  
> > router
> >               
>       and public clients to send requests to
> this  
> > router. -->
> >             
>    <server>private.localhost</server>
> >             
>    <!-- also allow private clients to send
> to the  
> > router so it can receive error messages -->
> >             
>    <client>private.localhost</client>
> >             
>    <client>public.localhost</client>
> >         
>    </trusted_domains>
> >         
>    <transport>
> >             
>    <server>public.localhost</server>
> >             
>    <port>5222</port>
> >             
>    <unixpath>/openils/var/sock/unix_sock</unixpath>
> >             
>    <username>router</username>
> >             
>    <password>pubctltroute</password>
> >             
>    <resource>router</resource>
> >             
>    <connect_timeout>10</connect_timeout>
> >             
>    <max_reconnect_attempts>5</max_reconnect_attempts>
> >         
>    </transport>
> >         
>    <logfile>/openils/var/log/public.router.log</logfile>
> >         
>    <!--
> >         
>    <logfile>syslog</logfile>
> >         
>    <syslog>local2</syslog>
> >         
>    -->
> >         
>    <loglevel>2</loglevel>
> >         </router>
> >         <router>
> <!-- private router -->
> >         
>    <trusted_domains>
> >             
>    <server>private.localhost</server>
> >             
>    <!-- only clients on the private domain
> can send  
> > requests to this router -->
> >             
>    <client>private.localhost</client>
> >         
>    </trusted_domains>
> >         
>    <transport>
> >             
>    <server>private.localhost</server>
> >             
>    <port>5222</port>
> >             
>    <username>router</username>
> >             
>    <password>privctltroute</password>
> >             
>    <resource>router</resource>
> >             
>    <connect_timeout>10</connect_timeout>
> >             
>    <max_reconnect_attempts>5</max_reconnect_attempts>
> >         
>    </transport>
> >         
>    <logfile>/openils/var/log/private.router.log</logfile>
> >         
>    <!--
> >         
>    <logfile>syslog</logfile>
> >         
>    <syslog>local2</syslog>
> >         
>    -->
> >         
>    <loglevel>4</loglevel>
> >         </router>
> >     </routers>
> >
> >   <!--  
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> >
> ==================================================================== 
> 
> > -->
> >
> > </config>
> >
> 
> 
> 
> 
> --
> Victoria Bush
> Opscan Evaluation Manager
> Center for Teaching, Learning & Technology
> vbush at ilstu.edu 
> 
> 
> 
> 


More information about the Open-ils-dev mailing list