[OPEN-ILS-DEV] Debugging OpenSRF installation
Victoria Bush
vbush at ilstu.edu
Fri Jun 12 16:35:10 EDT 2009
On Jun 12, 2009, at 1:39 PM, Dan Wells wrote:
> Hello Victoria,
>
> Though I think it is designed to test a full Evergreen install,
> running the settings-tester.pl script may help identify problems
> with just OpenSRF as well. To quote the wiki:
>
> "As the opensrf user, run the settings-tester.pl script to see if it
> finds any system configuration problems. The script is found at Open-
> ILS/src/support-scripts/settings-tester.pl in the Evergreen source
> tree."
>
> Feel free to send the output back to this list if it doesn't mean
> much to you.
Dan, thanks so much for your reply. I think there might be something
more going on than just a simple configuration error, but I defer to
the greater wisdom of the list.
I completely wiped the system and started over, just to eliminate
anything I might have screwed up. (Ah, the joys of a new machine.) So
I'm running Ubuntu 8.04 (with all available updates installed) and I
installed OpenSRF 1.0.6. I went meticulously through the installation
instructions again. I tried to run the code in "testing connections to
OpenSRF on the installation page again. Only this time I could request
the math service and get "4" back on the private.localhost connection,
but I could *not* get the resource when I would try to do it via the
public.localhost connection.
Let me repeat to be clear: testing worked fine on private.localhost,
but not on public.localhost.
I did run settings-tester.pl (assumed the trunk version was the latest
and greatest, dated 3 months ago). While I got the errors I expected
as Evergreen is not yet installed (lots of "please install <library>"
statements), it did test the ejabberd stuff:
> Checking Jabber connection for user opensrf, domain private.localhost
> * Jabber successfully connected
>
> Checking Jabber connection for user opensrf, domain public.localhost
> * Jabber successfully connected
>
> Checking Jabber connection for user router, domain public.localhost
> * Jabber successfully connected
>
> Checking Jabber connection for user router, domain private.localhost
> * Jabber successfully connected
>
And it looks fine. I see lots of OpenSRF processes:
> opensrf 3753 1 2 Jun11 ? 00:37:07 OpenSRF Router
> opensrf 3754 1 2 Jun11 ? 00:38:21 OpenSRF Router
> opensrf 14159 1 0 14:32 ? 00:00:00 OpenSRF Router
> opensrf 14165 1 0 14:32 ? 00:00:00 OpenSRF Router
> opensrf 14170 1 0 14:32 ? 00:00:00 OpenSRF controller
> [opensrf.settings]
> opensrf 14172 14170 0 14:32 ? 00:00:00 OpenSRF master
> [opensrf.settings]
> opensrf 14173 14170 0 14:32 ? 00:00:00 OpenSRF listener
> [opensrf.settings]
> opensrf 14174 14172 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.settings]
> opensrf 14175 14172 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.settings]
> opensrf 14176 14172 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.settings]
> opensrf 14177 14172 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.settings]
> opensrf 14178 14172 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.settings]
> opensrf 14179 1 0 14:32 ? 00:00:00 OpenSRF controller
> [opensrf.persist]
> opensrf 14181 14179 0 14:32 ? 00:00:00 OpenSRF master
> [opensrf.persist]
> opensrf 14183 1 0 14:32 ? 00:00:00 OpenSRF System-C
> opensrf 14184 14183 0 14:32 ? 00:00:00 OpenSRF Listener
> [opensrf.math]
> opensrf 14185 14184 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.math]
> opensrf 14188 14179 0 14:32 ? 00:00:00 OpenSRF listener
> [opensrf.persist]
> opensrf 14189 14184 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.math]
> opensrf 14190 14183 0 14:32 ? 00:00:00 OpenSRF Listener
> [opensrf.dbmath]
> opensrf 14191 14190 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.dbmath]
> opensrf 14192 14184 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.math]
> opensrf 14194 14190 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.dbmath]
> opensrf 14195 14184 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.math]
> opensrf 14197 14190 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.dbmath]
> opensrf 14198 14184 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.math]
> opensrf 14200 14190 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.dbmath]
> opensrf 14201 14190 0 14:32 ? 00:00:00 OpenSRF Drone
> [opensrf.dbmath]
> opensrf 14204 14181 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.persist]
> opensrf 14205 14181 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.persist]
> opensrf 14206 14181 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.persist]
> opensrf 14207 14181 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.persist]
> opensrf 14208 14181 0 14:32 ? 00:00:00 OpenSRF drone
> [opensrf.persist]
> vxbush 14399 6589 0 15:27 pts/0 00:00:00 grep OpenSRF
>
And ejabberd says it's running:
> $ sudo ejabberdctl status
> Node ejabberd at localhost is started. Status: started
> ejabberd is running
>
Upping the logging to level 4 for the log file /tmp/srfsh.log in
my .srfsh.xml file and trying to connect via public.localhost again, I
see this:
> srfsh 2009-06-12 15:08:52 [INFO:14305:osrf_system.c:415:]
> Bootstrapping system with domain public.localhost, port 5222, and
> unixpath (none)
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:282:]
> opensrf.math session is stateless
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:293:]
> Building a new client session with id [opensrf.math]
> [1244837338.011557.124483733814305]
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:500:]
> AppSession connecting to router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:456:] App
> Session [opensrf.math] [1244837338.011557.124483733814305] resetting
> remote id to router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:639:]
> AppSession in queue_wait with timeout 0
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:456:] App
> Session [opensrf.math] [1244837338.011557.124483733814305] resetting
> remote id to router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [INFO:14305:osrf_app_session.c:608:]
> [opensrf.math] sent 83 bytes of data to router at public.localhost/
> opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:611:] Sent:
> [{"__c":"osrfMessage","__p":{"threadTrace":"0","locale":"en-
> US","type":"CONNECT"}}]
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_app_session.c:639:]
> AppSession in queue_wait with timeout 5
> srfsh 2009-06-12 15:08:58 [INFO:14305:transport_session.c:436:]
> Received <error> message with type cancel and code 503
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:24:] Received
> message from transport code from router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:51:] Transport
> handler received new message from router at public.localhost/
> opensrf.math to opensrf at public.localhost/
> _evergreen_1244837332.075423_14305 with body
>
> [{"__c":"osrfMessage","__p":{"threadTrace":"0","locale":"en-
> US","type":"CONNECT"}}]
>
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:84:] We received
> 1 messages from router at public.localhost/opensrf.math
> srfsh 2009-06-12 15:08:58 [WARN:14305:osrf_stack.c:95:] !!!
> Received Jabber layer error message
> srfsh 2009-06-12 15:08:58 [WARN:14305:osrf_stack.c:105:] * Jabber
> Error is for top level remote id [router at public.localhost/
> opensrf.math], no one to send my message to! Cutting request short...
> srfsh 2009-06-12 15:08:58 [INFO:14305:osrf_stack.c:116:] Message
> processing duration 0.000164
> srfsh 2009-06-12 15:08:58 [DEBG:14305:osrf_stack.c:119:] after msg
> delete
> srfsh 2009-06-12 15:08:58 [ERR :14305:osrf_app_session.c:516:]
> cannot communicate with opensrf.math
> srfsh 2009-06-12 15:08:58 [WARN:14305:srfsh.c:576:] Unable to
> connect to remote service opensrf.math
>
> srfsh 2009-06-12 15:09:00 [DEBG:14305:socket_bundle.c:394:] removing
> socket 3
>
Now why would I get an error about the top level connection, when
testing via settings_tester.pl showed that the connections were
successfully made?
>
> If that doesn't help, I would say your problem is probably with
> ejabberd and probably a very small mistake. Try running (as root):
>
> ejabberdctl status
>
> and see what that reports. If it says ejabberd is "not running" try
> steps 5-7 again from the page you mentioned, including the sub-step
> in #5, then do the math test again. If that doesn't work, move on
> to carefully double-check your work in steps 9-10. If you are
> wondering, I am pretty sure it doesn't hurt anything to run the
> 'register' commands a second time if you feel you may have missed one.
>
> Good luck,
> DW
I did some google spelunking, and discovered that someone else was
having problems with code 503 errors in the IRC chat log at
http://www.open-ils.org/irc_logs/openils-evergreen/2009-03/%23openils-evergreen.16-Mon-2009.log
From looking over the responses and configuration files posted, the
supposed solution was changing
{access, max_user_sessions, [{10, all}]}.
to
{access, max_user_sessions, [{1000, all}]}.
(and which is now shown as an option in the installation
instructions). However, I don't have that line, and I do have
{max_user_sessions, 1000}.
as requested by the installation instructions.
I'm open to any suggestions.
--
Victoria Bush
Opscan Evaluation Manager
Center for Teaching, Learning & Technology
vbush at ilstu.edu
More information about the Open-ils-dev
mailing list