[OPEN-ILS-DEV] Startup problem with OpenSRF settings listener

Eric Lesage lesagee at iro.umontreal.ca
Mon Dec 18 10:21:23 EST 2006


On Mon, 18 Dec 2006, Bill Erickson wrote:

> I don't have a quick solution right now, but I'll try to shed some light
> where I can to get the conversation going.

Hi,

Thanks for the amazingly quick reply.

> The settings server is special.  It's the first service to start and behaves
> differently since it does not need to communicate over the network to gather
> settings information.  I'm not too surprised the settings server behaves
> differently if you are having problems at system startup.

Thanks for the clarification.

> There are a couple of things going on here.  From the Jabber XML, it appears
> the settings server connection to Ejabberd is being overwritten by a new
> connection over and over.  At first glance, I'm not sure what the root cause
> of this is, but it would explain the 100% CPU utilization and repeating log
> lines.

Interesting. The ejabberd logs show that the sessions are replaced only 
once, but this occurs only for the settings listener and for a bunch of 
drones. However, for some drones that have their session replaced, a new 
drone is spawned. So I guess the drone/listener connection gets killed 
there, because for each drone whose session got "replaced", there is no 
"closed connection" log entry in the jabber server log.

(This doesn't tell us why the session gets replaced in the first place.)

> The error you see in the logs is caused by the fact that the custom Perl
> Jabber parsing code is not equipped to handle these "conflict" error
> message.  (XMPP qualifies some error messages with the "stream" namespace,
> which is not defined in our Perl XML message parser).  All this really means
> is that we are not accustomed to getting this error from the Jabber server,
> so we haven't added the appropriate namespace support to the Perl code.

OK.

> I'm curious... are there any error log lines like this?  :  "Inbound process
> lost its jabber connection.  Attempting to reconnect..."

None.

> Does it appear that any new processes are being spawned over and over or is
> it just the one high-CPU process?

This is just one process caught in an infinite loop (the one where it 
gets to print the message I had pasted below).

Here are, for reference, the last log lines relevant to that process 
before it gets into that loop (note: this is not the same instance as in 
my previous mail, i.e., the process IDs will not match):

[2006-12-17 23:41:54] -e [INFO:32455:::] timed_read() read 195 bytes of data
[2006-12-17 23:41:54] -e [INTL:32455:::]  Got [<stream:error><conflict xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text xml:lang='' xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new connection</text></stream:error></stream:stream>] from the socket
[2006-12-17 23:41:54] -e [INTL:32455:::]  First read Buffer
[2006-12-17 23:41:54] -e [INTL:32455:::] Using tag: stream:error
[2006-12-17 23:41:54] -e [INTL:32455:::] completed read with <stream:error><conflict xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text xml:lang='' xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new connection</text></stream:error>
[2006-12-17 23:41:54] -e [INTL:32455:::] XML Read: <stream:error><conflict xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text xml:lang='' xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new connection</text></stream:error>
[2006-12-17 23:41:54] -e [INTL:32455:::] Call to Client->timed_read( FOREVER ) successfully returned data
[2006-12-17 23:41:54] -e [DEBG:32455:::] Inbound listener received 179 bytes of data
[2006-12-17 23:41:54] -e [DEBG:32455:::] Inbound listener calling process()
[2006-12-17 23:41:54] -e [INTL:32455:::] Temp Buffer Contained:
[2006-12-17 23:41:54] -e [INTL:32455:::] Using tag: /stream:stream
[2006-12-17 23:41:54] -e [INTL:32455:::] Temp Buffer After first attempt:
[2006-12-17 23:41:54] -e [INTL:32455:::] Calling timed_read with timetout 0
[2006-12-17 23:41:54] -e [INTL:32455:::]  Got [</stream:stream>] from the socket
[2006-12-17 23:41:54] -e [INTL:32455:::]  First read Buffer
[2006-12-17 23:41:54] -e [INTL:32455:::] Using tag:
[2006-12-17 23:41:54] -e [INTL:32455:::]  Got [] from the socket
[2006-12-17 23:41:54] -e [INTL:32455:::]  Got [] from the socket
...


> There are matching log entries in osrfsys.log.
>> 
>> Running with INTL logging, the listener keeps repeating:
>> [2006-12-17 23:13:33] -e [INTL:32121:::]  Got [] from the socket
>
>
> This would indicate reading from a closed socket, presumably chopped off by
> Ejabberd.

Well so I tried wildfire instead (it wasn't on the wiki a month ago when I 
started playing with it), and that works: no drone or listener seems to 
get dropped (altough, I haven't found the connection logs). I wonder if 
people who had tried ejabberd on gentoo got something similar?

On the other hand, no class gets registered on the router anymore (BTW, an 
option/setting to keep from daemonzing would be useful). It seems like the 
router is not receiving any message from the XMPP server(!).

Also, after exactly 30 minutes of sitting idle, the logs fill up again 
(not exactly the same way). This is probably due to an auto-disconnect 
somewhere, altough I suppose the software should react smoothly to that.

Well, on to chopchop:

Good news... altough Gaim can't login/register, it seems that everything 
else can use chopchop well enough that opensrf-c doesn't crash for lack of 
configuration values. :-)

> I've never run into this problem.  It might be helpful to send your config
> files (as attachments, scrubbed of passwords, etc.), since the configuration
> process is a common source of heartache.

If you (still) think it would help, I can send those off-list. Or if you 
need more logs, just ask.

> I also notice (from the logs above) that your install prefix is
> "/usr/local/openils/".  While there is no reason that should cause a
> problem, to my knowledge, no one has ever successfully installed Evergreen
> with an install prefix other than "/openils/".  Multiple config files
> reference the install prefix, so those all need to be updated by hand (until
> we have a better install kit, of course).

For what it's worth, I see no change in behaviour with a symlink from 
/openils to /usr/local/openils.


By the way, I've made myself a local portage tree to gather all 
dependencies that are not in the gentoo tree (I don't like using CPAN over 
portage). The app-misc/evergreen package just lists all dependencies 
(without attempting to actually install anything yet) documented on the 
wiki. For the record, on my system, it (indirectly) pulls off about 85 
packages (excluding postgresql and apache and their dependencies).

Regards,

-- 
Eric Lesage


More information about the Open-ils-dev mailing list