[OPEN-ILS-DEV] Debugging OpenSRF installation

Dan Wells dbw2 at calvin.edu
Thu Jun 18 11:26:21 EDT 2009


Hello all,

Well, I went back to square one and was able to reproduce this buggy behavior, so Victoria is not alone!

Furthermore, I think I have a lead for the developers to follow in fixing this.  It seems that the following default config in opensrf_core.xml is not being parsed as valid:

<opensrf>
    <routers>
      <router>
        <name>router</name>
        <domain>public.localhost</domain>
        <services>
            <service>opensrf.math</service>
        </services>
...

Since the math service was working fine on my full install, I noticed that the only difference in this section was that many more services were attached to the public.localhost router.  As a basic test, I added another fake service line, as follows:

...
        <services>
            <service>opensrf.math</service>
            <service>opensrf.blah</service>
        </services>
...

Bingo!  opensrf.math now tested fine when logging into public.localhost.  I also tested a case of simply doubling the opensrf.math line:

...
        <services>
            <service>opensrf.math</service>
            <service>opensrf.math</service>
        </services>
...

and that worked as well.

So, it seems that there is a bug in parsing opensrf_core.xml when only a single service is listed for a router.  That service does not get attached properly.  Adding another service allows the first service to work, but it is unknown if the second service is affected (that is, it may be the case that the last service listed is failing, though I somehow doubt that).

Victoria, try doubling the service line as I have, restart the stack, and see if the public math interface works for you.

Good luck,
DW


>>> Victoria Bush <vbush at ilstu.edu> 6/16/2009 10:59 AM >>>

On Jun 15, 2009, at 5:06 PM, Dan Wells wrote:

> Hello Victoria,
>
> I kinda hate to suggest this, since I thought this issue was fixed,  
> but have you tried starting the components separately?  That is:
>
> osrf_ctl.sh -l -a stop_all
>
> (wait a few minutes, kill as needed)
>
> osrf_ctl.sh -l -a start_router
>
> (wait for activity to stop/slow)
>
> osrf_ctl.sh -l -a start_perl
>
> (wait for activity to stop/slow)
>
> osrf_ctl.sh -l -a start_c
>
> Since many pieces in this system operate independently, starting  
> them in a more controlled fashion has been suggested in the past for  
> quirky race-condition-type problems.
>
> Good luck,
> DW
>
>

Okay, several things are causing confusion on my part because the  
documented behavior at
http://evergreen-ils.org/dokuwiki/doku.php?id=troubleshooting:checking_for_errors 

is not what I'm seeing. First of all, the default opensrf_core.xml  
example file that I used to create my file did not create separate log  
files, private.router.log and public.router.log. (Of course, on the  
troubleshooting page two paragraphs above this mention of two router  
log files, it mentions the single router.log that is instead created.)  
So I changed my xml file to create two separate log files.

In addition, the opensrf_core.xml example file claims that a log file  
gateway.log will be created, but I've never seen one. What seems to be  
happening is that the private.localhost router comes up fine, but the  
public.localhost one doesn't--or if it does come up, it's in some  
weird state that doesn't do anything.

Retracing my steps:

1. I stopped everything and killed all leftover processes. I also  
moved all the log files into a subdirectory to hide them for now.
2. I *only* started the router:
	osrf_ctl.sh -l -a start_router
3. The *only* log file that is created is private.router.log:

> router 2009-06-16 09:25:26 [INFO:30364:osrf_router_main.c:95:]  
> Router connecting as: server: private.localhost port: 5222 user:  
> router resource: router
> router 2009-06-16 09:25:26 [INFO:30364:osrf_router_main.c:117:]  
> Router adding trusted server: private.localhost
> router 2009-06-16 09:25:26 [INFO:30364:osrf_router_main.c:129:]  
> Router adding trusted client: private.localhost
>

I see no file called public.router.log, but there are two processes  
running:

$ ps -eaf | grep OpenSRF
opensrf  30368     1  0 09:25 ?        00:00:00 OpenSRF Router
opensrf  30369     1  0 09:25 ?        00:00:00 OpenSRF Router
opensrf  30385 29763  0 09:28 pts/1    00:00:00 grep OpenSRF

4. If I stop the router now:
	osrf.ctl.sh -l -a stop_router

NOW I see a public.router.log file, and it says:

> router 2009-06-16 09:51:55 [WARN:30368:osrf_router_main.c:11:]  
> Received signal [2], cleaning up...
>

So while the public router comes up, something's not right. But I have  
no idea how to diagnose this further.

The only changes in my opensrf_core.xml file since I last posted it  
was to change the names of the log files, as indicated above. So the  
differences between this core file and the example one included in  
OpenSRF 1.0.6 are just the passwords and the log files.

> $ diff opensrf_core.xml.example opensrf_core.xml
> 38c38
> <     <passwd>password</passwd>
> ---
> >     <passwd>*****</passwd>
> 104c104
> <     <passwd>password</passwd>
> ---
> >     <passwd>*****</passwd>
> 128c128
> <                 <password>password</password>
> ---
> >                 <password>*****</password>
> 133c133
> <             <logfile>/openils/var/log/router.log</logfile>
> ---
> >             <logfile>/openils/var/log/public.router.log</logfile>
> 150c150
> <                 <password>password</password>
> ---
> >                 <password>*****</password>
> 155c155
> <             <logfile>/openils/var/log/router.log</logfile>
> ---
> >             <logfile>/openils/var/log/private.router.log</logfile>


Here's my slightly updated opensrf_core.xml file.


> <?xml version="1.0"?>
> <!--
> vim:et:ts=2:sw=2:
> -->
> <config>
>
>   <!-- bootstrap config for OpenSRF apps -->
>   <opensrf>
>
>     <routers>
>
>       <!-- define the list of routers our services will register  
> with -->
>
>       <router>
>
>         <!-- This is the public router.  On this router, we only  
> register applications
>              which should be accessible to everyone on the opensrf  
> network -->
>         <name>router</name>
>         <domain>public.localhost</domain>
>         <services>
>             <service>opensrf.math</service>
>         </services>
>       </router>
>
>       <router>
>         <!-- This is the private router.  All applications must  
> register with
>             this router, so no explicit <services> section is  
> required -->
>         <name>router</name>
>         <domain>private.localhost</domain>
>       </router>
>     </routers>
>
>
>     <!-- Jabber login settings
>         Our domain should match that of the private router -->
>     <domain>private.localhost</domain>
>     <username>opensrf</username>
>     <passwd>privctltsrf</passwd>
>     <port>5222</port>
>     <!-- name of the router used on our private domain.
>         this should match one of the <name> of the private router  
> above -->
>     <router_name>router</router_name>
>
>     <!-- log file settings ======================================  -->
>     <!-- log to a local file -->
>     <logfile>/openils/var/log/osrfsys.log</logfile>
>
>     <!-- Log to syslog. You can use this same layout for
>         defining the logging of all services in this file -->
>     <!--
>     <logfile>syslog</logfile>
>     <syslog>local2</syslog>
>     <actlog>local1</actlog>
>     -->
>
>     <!-- 0 None, 1 Error, 2 Warning, 3 Info, 4 debug, 5 Internal  
> (Nasty) -->
>     <loglevel>3</loglevel>
>
>     <!-- config file for the services -->
>     <settings_config>/openils/conf/opensrf.xml</settings_config>
>
>   </opensrf>
>
>   <!-- Update this if you use ChopChop -->
>   <chopchop>
>     <!-- Our jabber server -->
>     <domain>private.localhost</domain>
>     <port>5222</port>
>     <!-- used when multiple servers need to communicate -->
>     <s2sport>5269</s2sport>
>     <secret>secret</secret>
>     <listen_address>10.0.0.3</listen_address>
>     <loglevel>3</loglevel>
>     <logfile>/openils/var/log/osrfsys.log</logfile>
>   </chopchop>
>
>   <!-- The section between <gateway>...</gateway> is a standard  
> OpenSRF C stack config file -->
>   <gateway>
>
>     <!--
>     we consider ourselves to be the "originating" client for requests,
>     which means we define the log XID string for log traces
>     -->
>     <client>true</client>
>
>     <!--  the routers's name on the network -->
>     <router_name>router</router_name>
>
>     <!--
>     These are the services that the gateway will serve.
>     Any other requests will receive an HTTP_NOT_FOUND (404)
>     DO NOT put any services here that you don't want the internet to  
> have access to
>     This section will be soon deprecated for multi-domain mode...
>     -->
>     <services>
>       <service>opensrf.math</service>
>     </services>
>
>     <!-- jabber login info -->
>
>     <!-- The gateway connects to the public domain -->
>     <domain>public.localhost</domain>
>     <username>opensrf</username>
>     <passwd>pubctltsrf</passwd>
>     <port>5222</port>
>     <logfile>/openils/var/log/gateway.log</logfile>
>     <loglevel>3</loglevel>
>
>   </gateway>
>
>   <!--  
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ====================================================================  
> -->
>
>     <routers>
>         <router> <!-- public router -->
>             <trusted_domains>
>                 <!-- allow private services to register with this  
> router
>                      and public clients to send requests to this  
> router. -->
>                 <server>private.localhost</server>
>                 <!-- also allow private clients to send to the  
> router so it can receive error messages -->
>                 <client>private.localhost</client>
>                 <client>public.localhost</client>
>             </trusted_domains>
>             <transport>
>                 <server>public.localhost</server>
>                 <port>5222</port>
>                 <unixpath>/openils/var/sock/unix_sock</unixpath>
>                 <username>router</username>
>                 <password>pubctltroute</password>
>                 <resource>router</resource>
>                 <connect_timeout>10</connect_timeout>
>                 <max_reconnect_attempts>5</max_reconnect_attempts>
>             </transport>
>             <logfile>/openils/var/log/public.router.log</logfile>
>             <!--
>             <logfile>syslog</logfile>
>             <syslog>local2</syslog>
>             -->
>             <loglevel>2</loglevel>
>         </router>
>         <router> <!-- private router -->
>             <trusted_domains>
>                 <server>private.localhost</server>
>                 <!-- only clients on the private domain can send  
> requests to this router -->
>                 <client>private.localhost</client>
>             </trusted_domains>
>             <transport>
>                 <server>private.localhost</server>
>                 <port>5222</port>
>                 <username>router</username>
>                 <password>privctltroute</password>
>                 <resource>router</resource>
>                 <connect_timeout>10</connect_timeout>
>                 <max_reconnect_attempts>5</max_reconnect_attempts>
>             </transport>
>             <logfile>/openils/var/log/private.router.log</logfile>
>             <!--
>             <logfile>syslog</logfile>
>             <syslog>local2</syslog>
>             -->
>             <loglevel>4</loglevel>
>         </router>
>     </routers>
>
>   <!--  
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ====================================================================  
> -->
>
> </config>
>




--
Victoria Bush
Opscan Evaluation Manager
Center for Teaching, Learning & Technology
vbush at ilstu.edu 





More information about the Open-ils-dev mailing list