[OPEN-ILS-DEV] Crashing problem with 1.2.3.1

Dan Scott denials at gmail.com
Fri Nov 21 15:00:05 EST 2008


2008/11/21 Garry Dunn <garry at trellisconsulting.ca>:
> To all,
>
> We (the Innisfil Public Library) are in need of some help.  We've been
> running Evergreen (for real!) for about 40 days and continue to have
> unpredictable crashes (on average once or twice a day).  It's done it from
> day one (and did it once on the test system before going live), but it's
> difficult to reproduce.  I run a third test system (much smaller in terms of
> memory/CPU) but I and can't make it crash.  I only have 1 or 2 staff clients
> connected at a time so it's much lower volume of activity.  All three
> systems (1 live and 2 test) are loaded with the same data.).  Here are some
> details:
>
> 1) to fix the problem, we simply issue the stop_all command, then run the 3
> start commands (start_router/start_perl/start_c) and everything is fine for
> a while.  We never have to touch PostGres or Apache.
> 2) the system is running on 1 server
> (memcache/ejabber/apache/opensrf/postgres/...).  Debian Etch OS with
> Evergreen 1.2.3.1.  Postgres 8.1.  4G of RAM on a new Dell PowerEdge server
> (lots of hard drive space on RAID).  The server is not running anything
> else--just Evergreen.
> 3) It tends to happen when staff is dealing with a patron who has a lot of
> holds/books out/fines, although that's not a guarantee.  Once the system is
> restarted, staff can go back to the problem patron and do what they'd like
> and it will be fine.
>
> We've got lots of logs captured from when it happens and I can provide
> snippets of those if interested.  The only thing I see in the logs is a
> message similar to this, approaching each failure:
>
> In osrfsys.log:
>
> [2008-11-12 12:15:21] open-ils.circ [ERR
> :17229:CStoreEditor.pm:86:12265090521716244] editor[1|11434]
> error starting database transaction
> [2008-11-12 12:15:21] open-ils.circ [ERR
> :17229:CStoreEditor.pm:269:12265090521716244] CStoreEditor lost it's
> connection!!
>
> The logs show quite a few error messages like this leading up to the
> complete crash (sometimes hours before the actual crash).
>
> We've been thinking it's a performance issue so we've played a bit with
> Dan's tweaks for PostGres (found here:
> http://www.coffeecode.net/archives/156-Tuning-PostgreSQL-for-Evergreen-on-a-test-server.html).
>  It doesn't seem to make a difference (but we've only tried a couple of
> different combinations).
>
> If anyone can provide some guidance about how to further examine/resolve the
> problem, we'd greatly appreciate it.

Hi Garry:

Wow, that sounds nasty. I should note that for a production system we
would normally recommend running PostgreSQL 8.2 (it's pretty
straightforward to configure and compile it from source), but let's
not jump yet. I haven't seen this kind of an issue before, but I also
haven't been running a system in production like you. That said, here
are a few stabs in the dark:

Can you monitor the processes to see if there is any strange
accumulation over time (like dozens and dozens of postgres backend
processes)? And/or monitor system load in general to see if anything
strange is going on.

Do you have logging turned on at the PostgreSQL level? The PostgreSQL
logs might have some interesting information to poke through.

It also might be worthwhile running a recent version of
settings-tester.pl to give us a sanity check on the versions of all of
the prerequisite Perl modules.

-- 
Dan Scott
Laurentian University


More information about the Open-ils-dev mailing list