[OPEN-ILS-DEV] Startup problem with OpenSRF settings listener

Bill Erickson billserickson at gmail.com
Mon Dec 18 01:12:18 EST 2006


I don't have a quick solution right now, but I'll try to shed some light
where I can to get the conversation going.

On 12/17/06, Eric Lesage <lesagee at iro.umontreal.ca> wrote:
>
> Hello,
>
> I hope this is the right mailing list for this problem; otherwise, my
> apologies.


You're in the right place :)

I'm having a problem starting the opensrf settings listener. All the other
> listeners defined in opensrf.xml (let's start simple) seem to be up except
> that one.
>
> What is also surprising is that when the system is started, there is no
> status line pair indicating the [UnixServer/Listener] is being started
> (yet there is a <appname>opensrf.settings</appname> line in my
> opensrf.xml). Is the settings server special somehow?


The settings server is special.  It's the first service to start and behaves
differently since it does not need to communicate over the network to gather
settings information.  I'm not too surprised the settings server behaves
differently if you are having problems at system startup.

Instead, the settings listener consumes 100% CPU and does not
> register as a XMPP resource within the "ils" account (it is however,
> registered with the router).
>
> The "stateless" property does not seem to be a factor (altough, I must
> admit I don't really understand what it does).


Right.  This setting should not be a factor at this point.

The opensrf.settings_unix.log says:
>
> --begin--
> 2006/12/17-23:12:39 OpenSRF::UnixServer (type OpenSRF) starting!
> pid(32114)
> Binding to UNIX socket file
> /usr/local/openils/var/sock/opensrf.settings_unix.sock using SOCK_STREAM
> Group Not Defined.  Defaulting to EGID '1509 1509'
> User Not Defined.  Defaulting to EUID '1503'
> Setting up serialization via flock
> Beginning prefork (5 processes)
> Starting "5" children
> 2006/12/17-23:12:40 CONNECT UNIX Socket:
> "/usr/local/openils/var/sock/opensrf.settings_unix.sock"
> MessageWrapper received bad XML : error = :1: namespace error : Namespace
> prefix stream on error is not defined
> <stream:error><conflict xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text
> xml:l
>               ^ at
> /usr/local/lib/site_perl/OpenSRF/Transport/SlimJabber/MessageWrapper.pm line
> 17.
>
> XML = <stream:error><conflict
> xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text xml:lang=''
> xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new
> connection</text></stream:error>
> :1: namespace error : Namespace prefix stream on error is not defined
> <stream:error><conflict xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text
> xml:l
>               ^ at
> /usr/local/lib/site_perl/OpenSRF/Transport/SlimJabber/MessageWrapper.pm line
> 17.
>
> Starting "2" children
> Processing diff (-1), Waiting diff (0)
> Killing "1" children
> Starting "1" children
> 2006/12/17-23:13:33 Server closing!
> --end--


There are a couple of things going on here.  From the Jabber XML, it appears
the settings server connection to Ejabberd is being overwritten by a new
connection over and over.  At first glance, I'm not sure what the root cause
of this is, but it would explain the 100% CPU utilization and repeating log
lines.

The error you see in the logs is caused by the fact that the custom Perl
Jabber parsing code is not equipped to handle these "conflict" error
message.  (XMPP qualifies some error messages with the "stream" namespace,
which is not defined in our Perl XML message parser).  All this really means
is that we are not accustomed to getting this error from the Jabber server,
so we haven't added the appropriate namespace support to the Perl code.

I'm curious... are there any error log lines like this?  :  "Inbound process
lost its jabber connection.  Attempting to reconnect..."

Does it appear that any new processes are being spawned over and over or is
it just the one high-CPU process?

There are matching log entries in osrfsys.log.
>
> Running with INTL logging, the listener keeps repeating:
> [2006-12-17 23:13:33] -e [INTL:32121:::]  Got [] from the socket


This would indicate reading from a closed socket, presumably chopped off by
Ejabberd.

I'm attempting to run evergreen 1.0.1 on a Gentoo/amd64 system using
> ejabberd (which, otherwise, seems to work). The fact that the settings
> listener has problems will cause opensrf-c to crash later.
>
> Has anyone else seen this problem?


I've never run into this problem.  It might be helpful to send your config
files (as attachments, scrubbed of passwords, etc.), since the configuration
process is a common source of heartache.

I also notice (from the logs above) that your install prefix is
"/usr/local/openils/".  While there is no reason that should cause a
problem, to my knowledge, no one has ever successfully installed Evergreen
with an install prefix other than "/openils/".  Multiple config files
reference the install prefix, so those all need to be updated by hand (until
we have a better install kit, of course).

Thanks,
>
> --
> Eric Lesage
>

Thanks for the details.  I hope this helps...

-bill


-- 
Bill Erickson
PINES Systems Developer
Georgia Public Library Service
billserickson at gmail.com
http://open-ils.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20061218/6bc67a35/attachment.html


More information about the Open-ils-dev mailing list