[OPEN-ILS-DEV] Startup problem with OpenSRF settings listener

Bill Erickson billserickson at gmail.com
Tue Dec 19 11:45:22 EST 2006


On 12/18/06, Eric Lesage <lesagee at iro.umontreal.ca> wrote:
>
> On Mon, 18 Dec 2006, Bill Erickson wrote:
>
> > I don't have a quick solution right now, but I'll try to shed some light
> > where I can to get the conversation going.
>
> Hi,
>
> Thanks for the amazingly quick reply.
>
> > The settings server is special.  It's the first service to start and
> behaves
> > differently since it does not need to communicate over the network to
> gather
> > settings information.  I'm not too surprised the settings server behaves
>
> > differently if you are having problems at system startup.
>
> Thanks for the clarification.
>
> > There are a couple of things going on here.  From the Jabber XML, it
> appears
> > the settings server connection to Ejabberd is being overwritten by a new
>
> > connection over and over.  At first glance, I'm not sure what the root
> cause
> > of this is, but it would explain the 100% CPU utilization and repeating
> log
> > lines.
>
> Interesting. The ejabberd logs show that the sessions are replaced only
> once, but this occurs only for the settings listener and for a bunch of
> drones. However, for some drones that have their session replaced, a new
> drone is spawned. So I guess the drone/listener connection gets killed
> there, because for each drone whose session got "replaced", there is no
> "closed connection" log entry in the jabber server log.
>
> (This doesn't tell us why the session gets replaced in the first place.)


Right, exactly.

> The error you see in the logs is caused by the fact that the custom Perl
> > Jabber parsing code is not equipped to handle these "conflict" error
> > message.  (XMPP qualifies some error messages with the "stream"
> namespace,
> > which is not defined in our Perl XML message parser).  All this really
> means
> > is that we are not accustomed to getting this error from the Jabber
> server,
> > so we haven't added the appropriate namespace support to the Perl code.
>
> OK.
>
> > I'm curious... are there any error log lines like this?  :  "Inbound
> process
> > lost its jabber connection.  Attempting to reconnect..."
>
> None.
>
> > Does it appear that any new processes are being spawned over and over or
> is
> > it just the one high-CPU process?
>
> This is just one process caught in an infinite loop (the one where it
> gets to print the message I had pasted below).



Here are, for reference, the last log lines relevant to that process
> before it gets into that loop (note: this is not the same instance as in
> my previous mail, i.e., the process IDs will not match):
>
> [2006-12-17 23:41:54] -e [INFO:32455:::] timed_read() read 195 bytes of
> data
> [2006-12-17 23:41:54] -e [INTL:32455:::]  Got [<stream:error><conflict
> xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text xml:lang=''
> xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new
> connection</text></stream:error></stream:stream>] from the socket
> [2006-12-17 23:41:54] -e [INTL:32455:::]  First read Buffer
> [2006-12-17 23:41:54] -e [INTL:32455:::] Using tag: stream:error
> [2006-12-17 23:41:54] -e [INTL:32455:::] completed read with
> <stream:error><conflict xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text
> xml:lang='' xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new
> connection</text></stream:error>
> [2006-12-17 23:41:54] -e [INTL:32455:::] XML Read: <stream:error><conflict
> xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text xml:lang=''
> xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new
> connection</text></stream:error>
> [2006-12-17 23:41:54] -e [INTL:32455:::] Call to Client->timed_read(
> FOREVER ) successfully returned data
> [2006-12-17 23:41:54] -e [DEBG:32455:::] Inbound listener received 179
> bytes of data
> [2006-12-17 23:41:54] -e [DEBG:32455:::] Inbound listener calling
> process()
> [2006-12-17 23:41:54] -e [INTL:32455:::] Temp Buffer Contained:
> [2006-12-17 23:41:54] -e [INTL:32455:::] Using tag: /stream:stream
> [2006-12-17 23:41:54] -e [INTL:32455:::] Temp Buffer After first attempt:
> [2006-12-17 23:41:54] -e [INTL:32455:::] Calling timed_read with timetout
> 0
> [2006-12-17 23:41:54] -e [INTL:32455:::]  Got [</stream:stream>] from the
> socket
> [2006-12-17 23:41:54] -e [INTL:32455:::]  First read Buffer
> [2006-12-17 23:41:54] -e [INTL:32455:::] Using tag:
> [2006-12-17 23:41:54] -e [INTL:32455:::]  Got [] from the socket
> [2006-12-17 23:41:54] -e [INTL:32455:::]  Got [] from the socket
> ...
>
>
> > There are matching log entries in osrfsys.log.
> >>
> >> Running with INTL logging, the listener keeps repeating:
> >> [2006-12-17 23:13:33] -e [INTL:32121:::]  Got [] from the socket
> >
> >
> > This would indicate reading from a closed socket, presumably chopped off
> by
> > Ejabberd.
>
> Well so I tried wildfire instead (it wasn't on the wiki a month ago when I
>
> started playing with it), and that works: no drone or listener seems to
> get dropped (altough, I haven't found the connection logs). I wonder if
> people who had tried ejabberd on gentoo got something similar?
>
> On the other hand, no class gets registered on the router anymore (BTW, an
> option/setting to keep from daemonzing would be useful). It seems like the
> router is not receiving any message from the XMPP server(!).


Comment out the call to daemonize() in line 117 of
OpenSRF/src/router/osrf_router_main.c and rebuild.  Then run the router by
hand in a different terminal before launching the other services.  I agree a
flag would be good, though. :)

Also, after exactly 30 minutes of sitting idle, the logs fill up again
> (not exactly the same way). This is probably due to an auto-disconnect
> somewhere, altough I suppose the software should react smoothly to that.
>
> Well, on to chopchop:
>
> Good news... altough Gaim can't login/register, it seems that everything
> else can use chopchop well enough that opensrf-c doesn't crash for lack of
> configuration values. :-)



That's good (and interesting).  Chopchop implements only very small portion
of the XMPP spec.  When a connection is replaced on chopchop, it does not
send an error message to the original connection, it just quietly replaces
it.  (Although, it should log a message in the chopchop log.)  So it's
likely you are still having the same problem, but the ILS code handles the
jabber server responses more gracefully (well, since there aren't any in
this case).


> I've never run into this problem.  It might be helpful to send your config
> > files (as attachments, scrubbed of passwords, etc.), since the
> configuration
> > process is a common source of heartache.
>
> If you (still) think it would help, I can send those off-list. Or if you
> need more logs, just ask.


I do think seeing the configs could help.  You can send them directly to me
if you like.  I have to warn you, though, I'm leaving in the morning for a
much needed vacation (10 days + recovery time ;)).

> I also notice (from the logs above) that your install prefix is
> > "/usr/local/openils/".  While there is no reason that should cause a
> > problem, to my knowledge, no one has ever successfully installed
> Evergreen
> > with an install prefix other than "/openils/".  Multiple config files
> > reference the install prefix, so those all need to be updated by hand
> (until
> > we have a better install kit, of course).
>
> For what it's worth, I see no change in behaviour with a symlink from
> /openils to /usr/local/openils.


Excellent!

By the way, I've made myself a local portage tree to gather all
> dependencies that are not in the gentoo tree (I don't like using CPAN over
> portage). The app-misc/evergreen package just lists all dependencies
> (without attempting to actually install anything yet) documented on the
> wiki. For the record, on my system, it (indirectly) pulls off about 85
> packages (excluding postgresql and apache and their dependencies).


Very cool.  We always talked about creating an ebuild for EG on Gentoo, but
after moving most of our machines to Debian (for ease of administering the
armada of servers), it fell off the radar.  If you get the ebuild working
and installing, I'm sure others would be interested in trying it out.

I'm sure the root cause of your problem has nothing to do with the jabber
server.  The servers are just behaving a little differently in this
situation.  Any logs/configs you can send off-list will be useful.  Of
course, I'll make sure this email thread is updated with any pertinent info
for everyone's benefit.

I'm about to sign off for several days, though.   Best of luck, and keep us
posted on any progress.

-bill



-- 
Bill Erickson
PINES Systems Developer
Georgia Public Library Service
billserickson at gmail.com
http://open-ils.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20061219/21dfe6bf/attachment-0001.html


More information about the Open-ils-dev mailing list