[OPEN-ILS-DEV] Debugging OpenSRF installation

Victoria Bush vbush at ilstu.edu
Fri Jun 12 16:35:10 EDT 2009


On Jun 12, 2009, at 1:39 PM, Dan Wells wrote:

> Hello Victoria,
>
> Though I think it is designed to test a full Evergreen install,  
> running the settings-tester.pl script may help identify problems  
> with just OpenSRF as well.  To quote the wiki:
>
> "As the opensrf user, run the settings-tester.pl script to see if it  
> finds any system configuration problems. The script is found at Open- 
> ILS/src/support-scripts/settings-tester.pl in the Evergreen source  
> tree."
>
> Feel free to send the output back to this list if it doesn't mean  
> much to you.

Dan, thanks so much for your reply. I think there might be something  
more going on than just a simple configuration error, but I defer to  
the greater wisdom of the list.

I completely wiped the system and started over, just to eliminate  
anything I might have screwed up. (Ah, the joys of a new machine.) So  
I'm running Ubuntu 8.04 (with all available updates installed) and I  
installed OpenSRF 1.0.6. I went meticulously through the installation  
instructions again. I tried to run the code in "testing connections to  
OpenSRF on the installation page again. Only this time I could request  
the math service and get "4" back on the private.localhost connection,  
but I could *not* get the resource when I would try to do it via the  
public.localhost connection.

Let me repeat to be clear: testing worked fine on private.localhost,  
but not on public.localhost.

I did run settings-tester.pl (assumed the trunk version was the latest  
and greatest, dated 3 months ago). While I got the errors I expected  
as Evergreen is not yet installed (lots of "please install <library>"  
statements), it did test the ejabberd stuff:

> Checking Jabber connection for user opensrf, domain private.localhost
> * Jabber successfully connected
>
> Checking Jabber connection for user opensrf, domain public.localhost
> * Jabber successfully connected
>
> Checking Jabber connection for user router, domain public.localhost
> * Jabber successfully connected
>
> Checking Jabber connection for user router, domain private.localhost
> * Jabber successfully connected
>



And it looks fine. I see lots of OpenSRF processes:

> opensrf   3753     1  2 Jun11 ?        00:37:07 OpenSRF Router
> opensrf   3754     1  2 Jun11 ?        00:38:21 OpenSRF Router
> opensrf  14159     1  0 14:32 ?        00:00:00 OpenSRF Router
> opensrf  14165     1  0 14:32 ?        00:00:00 OpenSRF Router
> opensrf  14170     1  0 14:32 ?        00:00:00 OpenSRF controller  
> [opensrf.settings]
> opensrf  14172 14170  0 14:32 ?        00:00:00 OpenSRF master  
> [opensrf.settings]
> opensrf  14173 14170  0 14:32 ?        00:00:00 OpenSRF listener  
> [opensrf.settings]
> opensrf  14174 14172  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.settings]
> opensrf  14175 14172  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.settings]
> opensrf  14176 14172  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.settings]
> opensrf  14177 14172  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.settings]
> opensrf  14178 14172  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.settings]
> opensrf  14179     1  0 14:32 ?        00:00:00 OpenSRF controller  
> [opensrf.persist]
> opensrf  14181 14179  0 14:32 ?        00:00:00 OpenSRF master  
> [opensrf.persist]
> opensrf  14183     1  0 14:32 ?        00:00:00 OpenSRF System-C
> opensrf  14184 14183  0 14:32 ?        00:00:00 OpenSRF Listener  
> [opensrf.math]
> opensrf  14185 14184  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.math]
> opensrf  14188 14179  0 14:32 ?        00:00:00 OpenSRF listener  
> [opensrf.persist]
> opensrf  14189 14184  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.math]
> opensrf  14190 14183  0 14:32 ?        00:00:00 OpenSRF Listener  
> [opensrf.dbmath]
> opensrf  14191 14190  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.dbmath]
> opensrf  14192 14184  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.math]
> opensrf  14194 14190  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.dbmath]
> opensrf  14195 14184  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.math]
> opensrf  14197 14190  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.dbmath]
> opensrf  14198 14184  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.math]
> opensrf  14200 14190  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.dbmath]
> opensrf  14201 14190  0 14:32 ?        00:00:00 OpenSRF Drone  
> [opensrf.dbmath]
> opensrf  14204 14181  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.persist]
> opensrf  14205 14181  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.persist]
> opensrf  14206 14181  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.persist]
> opensrf  14207 14181  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.persist]
> opensrf  14208 14181  0 14:32 ?        00:00:00 OpenSRF drone  
> [opensrf.persist]
> vxbush   14399  6589  0 15:27 pts/0    00:00:00 grep OpenSRF
>



And ejabberd says it's running:

> $ sudo ejabberdctl status
> Node ejabberd at localhost is started. Status: started
> ejabberd is running
>

Upping the logging to level 4 for the log file /tmp/srfsh.log in  
my .srfsh.xml file and trying to connect via public.localhost again, I  
see this:


> srfsh 2009-06-12 15:08:52 [INFO:14305:osrf_system.c:415:]  
> Bootstrapping system with domain public.localhost, port 5222, and  
> unixpath (none)
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:282:]  
> opensrf.math session is stateless
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:293:]  
> Building a new client session with id [opensrf.math]  
> [1244837338.011557.124483733814305]
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:500:]  
> AppSession connecting to router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:456:] App  
> Session [opensrf.math] [1244837338.011557.124483733814305] resetting  
> remote id to router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:639:]  
> AppSession in queue_wait with timeout 0
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:456:] App  
> Session [opensrf.math] [1244837338.011557.124483733814305] resetting  
> remote id to router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [INFO:14305:osrf_app_session.c:608:]  
> [opensrf.math] sent 83 bytes of data to router at public.localhost/ 
> opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:611:] Sent:  
> [{"__c":"osrfMessage","__p":{"threadTrace":"0","locale":"en- 
> US","type":"CONNECT"}}]
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:639:]  
> AppSession in queue_wait with timeout 5
> srfsh 2009-06-12 15:08:58 [INFO:14305:transport_session.c:436:]  
> Received <error> message with type cancel and code 503
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:24:] Received  
> message from transport code from router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:51:] Transport  
> handler received new message from router at public.localhost/ 
> opensrf.math to opensrf at public.localhost/ 
> _evergreen_1244837332.075423_14305 with body
>
> [{"__c":"osrfMessage","__p":{"threadTrace":"0","locale":"en- 
> US","type":"CONNECT"}}]
>
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:84:] We received  
> 1 messages from router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [WARN:14305:osrf_stack.c:95:]  !!!  
> Received Jabber layer error message
> srfsh 2009-06-12 15:08:58 [WARN:14305:osrf_stack.c:105:]  * Jabber  
> Error is for top level remote  id [router at public.localhost/ 
> opensrf.math], no one to send my message to!  Cutting request short...
> srfsh 2009-06-12 15:08:58 [INFO:14305:osrf_stack.c:116:] Message  
> processing duration 0.000164
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:119:] after msg  
> delete
> srfsh 2009-06-12 15:08:58 [ERR :14305:osrf_app_session.c:516:]  
> cannot communicate with opensrf.math
> srfsh 2009-06-12 15:08:58 [WARN:14305:srfsh.c:576:] Unable to  
> connect to remote service opensrf.math
>
> srfsh 2009-06-12 15:09:00 [DEBG:14305:socket_bundle.c:394:] removing  
> socket 3
>


Now why would I get an error about the top level connection, when  
testing via settings_tester.pl showed that the connections were  
successfully made?

>
> If that doesn't help, I would say your problem is probably with  
> ejabberd and probably a very small mistake.  Try running (as root):
>
> ejabberdctl status
>
> and see what that reports.  If it says ejabberd is "not running" try  
> steps 5-7 again from the page you mentioned, including the sub-step  
> in #5, then do the math test again.  If that doesn't work, move on  
> to carefully double-check your work in steps 9-10.  If you are  
> wondering, I am pretty sure it doesn't hurt anything to run the  
> 'register' commands a second time if you feel you may have missed one.
>
> Good luck,
> DW

I did some google spelunking, and discovered that someone else was  
having problems with code 503 errors in the IRC chat log at
http://www.open-ils.org/irc_logs/openils-evergreen/2009-03/%23openils-evergreen.16-Mon-2009.log

 From looking over the responses and configuration files posted, the  
supposed solution was changing
	{access, max_user_sessions, [{10, all}]}.
to
	{access, max_user_sessions, [{1000, all}]}.
(and which is now shown as an option in the installation  
instructions). However, I don't have that line, and I do have
	{max_user_sessions, 1000}.
as requested by the installation instructions.

I'm open to any suggestions.


--
Victoria Bush
Opscan Evaluation Manager
Center for Teaching, Learning & Technology
vbush at ilstu.edu





More information about the Open-ils-dev mailing list