[OPEN-ILS-DEV] Crashing problem with 1.2.3.1

Garry Dunn garry at trellisconsulting.ca
Fri Nov 21 16:00:40 EST 2008


Dan Scott wrote:
> 2008/11/21 Garry Dunn <garry at trellisconsulting.ca>:
>> To all,
>>
>> We (the Innisfil Public Library) are in need of some help.  We've been
>> running Evergreen (for real!) for about 40 days and continue to have
>> unpredictable crashes (on average once or twice a day).  It's done it from
>> day one (and did it once on the test system before going live), but it's
>> difficult to reproduce.  I run a third test system (much smaller in terms of
>> memory/CPU) but I and can't make it crash.  I only have 1 or 2 staff clients
>> connected at a time so it's much lower volume of activity.  All three
>> systems (1 live and 2 test) are loaded with the same data.).  Here are some
>> details:
>>
>> 1) to fix the problem, we simply issue the stop_all command, then run the 3
>> start commands (start_router/start_perl/start_c) and everything is fine for
>> a while.  We never have to touch PostGres or Apache.
>> 2) the system is running on 1 server
>> (memcache/ejabber/apache/opensrf/postgres/...).  Debian Etch OS with
>> Evergreen 1.2.3.1.  Postgres 8.1.  4G of RAM on a new Dell PowerEdge server
>> (lots of hard drive space on RAID).  The server is not running anything
>> else--just Evergreen.
>> 3) It tends to happen when staff is dealing with a patron who has a lot of
>> holds/books out/fines, although that's not a guarantee.  Once the system is
>> restarted, staff can go back to the problem patron and do what they'd like
>> and it will be fine.
>>
>> We've got lots of logs captured from when it happens and I can provide
>> snippets of those if interested.  The only thing I see in the logs is a
>> message similar to this, approaching each failure:
>>
>> In osrfsys.log:
>>
>> [2008-11-12 12:15:21] open-ils.circ [ERR
>> :17229:CStoreEditor.pm:86:12265090521716244] editor[1|11434]
>> error starting database transaction
>> [2008-11-12 12:15:21] open-ils.circ [ERR
>> :17229:CStoreEditor.pm:269:12265090521716244] CStoreEditor lost it's
>> connection!!
>>
>> The logs show quite a few error messages like this leading up to the
>> complete crash (sometimes hours before the actual crash).
>>
>> We've been thinking it's a performance issue so we've played a bit with
>> Dan's tweaks for PostGres (found here:
>> http://www.coffeecode.net/archives/156-Tuning-PostgreSQL-for-Evergreen-on-a-test-server.html).
>>  It doesn't seem to make a difference (but we've only tried a couple of
>> different combinations).
>>
>> If anyone can provide some guidance about how to further examine/resolve the
>> problem, we'd greatly appreciate it.
> 
> Hi Garry:
> 
> Wow, that sounds nasty. I should note that for a production system we
> would normally recommend running PostgreSQL 8.2 (it's pretty
> straightforward to configure and compile it from source), but let's
> not jump yet. I haven't seen this kind of an issue before, but I also
> haven't been running a system in production like you. That said, here
> are a few stabs in the dark:
> 
> Can you monitor the processes to see if there is any strange
> accumulation over time (like dozens and dozens of postgres backend
> processes)? And/or monitor system load in general to see if anything
> strange is going on.
> 
> Do you have logging turned on at the PostgreSQL level? The PostgreSQL
> logs might have some interesting information to poke through.
> 
> It also might be worthwhile running a recent version of
> settings-tester.pl to give us a sanity check on the versions of all of
> the prerequisite Perl modules.
> 

I should also note that the system isn't particularly large (86000 bib 
records, 12000 patrons, 19000 items in circulation, 5500 holds).  The 
patrons that _seem_ to cause the problems might have 60 or 70 holds on 
items.

I can't remember why we didn't go with PostGres 8.2.  I have a vague 
recollection of having problems with the installation but I'll revisit 
that on our test systems.

We'll also try to do some checks on what is happening the next time it 
crashes.  We usually just restart and get back to business, but that 
will have to change!

Thanks for the quick reply.  BTW: I look forward to hooking up at OLA. 
I'm eager to find out more about what's under the hood.


More information about the Open-ils-dev mailing list