[OPEN-ILS-DEV] Crashing problem with 1.2.3.1

Garry Dunn garry at trellisconsulting.ca
Sun Nov 23 23:28:39 EST 2008


Dan Scott wrote:
> 2008/11/21 Garry Dunn <garry at trellisconsulting.ca>:
>> To all,
>>
>> We (the Innisfil Public Library) are in need of some help.  We've been
>> running Evergreen (for real!) for about 40 days and continue to have
>> unpredictable crashes (on average once or twice a day).  It's done it from
>> day one (and did it once on the test system before going live), but it's
>> difficult to reproduce.  I run a third test system (much smaller in terms of
>> memory/CPU) but I and can't make it crash.  I only have 1 or 2 staff clients
>> connected at a time so it's much lower volume of activity.  All three
>> systems (1 live and 2 test) are loaded with the same data.).  Here are some
>> details:
>>
>> 1) to fix the problem, we simply issue the stop_all command, then run the 3
>> start commands (start_router/start_perl/start_c) and everything is fine for
>> a while.  We never have to touch PostGres or Apache.
>> 2) the system is running on 1 server
>> (memcache/ejabber/apache/opensrf/postgres/...).  Debian Etch OS with
>> Evergreen 1.2.3.1.  Postgres 8.1.  4G of RAM on a new Dell PowerEdge server
>> (lots of hard drive space on RAID).  The server is not running anything
>> else--just Evergreen.
>> 3) It tends to happen when staff is dealing with a patron who has a lot of
>> holds/books out/fines, although that's not a guarantee.  Once the system is
>> restarted, staff can go back to the problem patron and do what they'd like
>> and it will be fine.
>>
>> We've got lots of logs captured from when it happens and I can provide
>> snippets of those if interested.  The only thing I see in the logs is a
>> message similar to this, approaching each failure:
>>
>> In osrfsys.log:
>>
>> [2008-11-12 12:15:21] open-ils.circ [ERR
>> :17229:CStoreEditor.pm:86:12265090521716244] editor[1|11434]
>> error starting database transaction
>> [2008-11-12 12:15:21] open-ils.circ [ERR
>> :17229:CStoreEditor.pm:269:12265090521716244] CStoreEditor lost it's
>> connection!!
>>
>> The logs show quite a few error messages like this leading up to the
>> complete crash (sometimes hours before the actual crash).
>>
>> We've been thinking it's a performance issue so we've played a bit with
>> Dan's tweaks for PostGres (found here:
>> http://www.coffeecode.net/archives/156-Tuning-PostgreSQL-for-Evergreen-on-a-test-server.html).
>>  It doesn't seem to make a difference (but we've only tried a couple of
>> different combinations).
>>
>> If anyone can provide some guidance about how to further examine/resolve the
>> problem, we'd greatly appreciate it.
> 
> Hi Garry:
> 
> Wow, that sounds nasty. I should note that for a production system we
> would normally recommend running PostgreSQL 8.2 (it's pretty
> straightforward to configure and compile it from source), but let's
> not jump yet. I haven't seen this kind of an issue before, but I also
> haven't been running a system in production like you. That said, here
> are a few stabs in the dark:
> 
> Can you monitor the processes to see if there is any strange
> accumulation over time (like dozens and dozens of postgres backend
> processes)? And/or monitor system load in general to see if anything
> strange is going on.
> 
> Do you have logging turned on at the PostgreSQL level? The PostgreSQL
> logs might have some interesting information to poke through.
> 
> It also might be worthwhile running a recent version of
> settings-tester.pl to give us a sanity check on the versions of all of
> the prerequisite Perl modules.
> 

We've got some things in place to track what's going on with the server, 
around the time of the crashes.  We'll report back in a few days (after 
a few more crashes).

The settings-tester.pl comes back with:

LWP::UserAgent version 2.033
XML::LibXML version 1.66
XML::LibXML::XPathContext version 1.66
XML::LibXSLT version 1.59
Net::Server::PreFork version 0.90
Cache::Memcached version 1.15
Class::DBI version 0.96
Class::DBI::AbstractSearch version 0.07
Template version 2.14
DBD::Pg version 1.49
Net::Z3950::ZOOM version 1.24
MARC::Record version 2.0.0
MARC::Charset version 1.0
MARC::File::XML version 0.88
Text::Aspell version 0.04
CGI version 3.15
DateTime::TimeZone version 0.42
DateTime version 0.35
DateTime::Format::ISO8601 version 0.06
DateTime::Format::Mail version 0.2901
Unix::Syslog version 0.100
GD::Graph3d version 0.63
JavaScript::SpiderMonkey version 0.19
Log::Log4perl version 1.07
Email::Send version 2.181
Text::CSV_XS version 0.23
Spreadsheet::WriteExcel::Big version 2.01
Tie::IxHash version 1.21

Checking Jabber connection
* Jabber successfully connected
Checking database connections
* /opensrf/default/reporter/setup :: Successfully connected to database 
dbi:Pg:dbname=evergreen;host=evergreendb;port=5432
   * Database has the expected server encoding UTF8.
* /opensrf/default/apps/open-ils.storage/app_settings/databases :: 
Successfully connected to database 
dbi:Pg:dbname=evergreen;host=evergreendb;port=5432
   * Database has the expected server encoding UTF8.
* /opensrf/default/apps/open-ils.cstore/app_settings :: Successfully 
connected to database dbi:Pg:dbname=evergreen;host=evergreendb;port=5432
   * Database has the expected server encoding UTF8.
* /opensrf/default/apps/open-ils.reporter-store/app_settings :: 
Successfully connected to database 
dbi:Pg:dbname=evergreen;host=evergreendb;port=5432
   * Database has the expected server encoding UTF8.
Checking database drivers to ensure <driver> matches <language>
* OK: Pg language is undefined for reporter base configuration
* OK: Pg language is perl in /opensrf/default/apps/open-ils.storage/language
* OK: pgsql language is C in /opensrf/default/apps/open-ils.cstore/language
* OK: pgsql language is C in 
/opensrf/default/apps/open-ils.reporter-store/language
Checking libdbi and libdbi-drivers
/usr/local/lib/dbd/libdbdpgsql.so
  was not linked against libdbi - you probably need to compile 
libdbi-drivers from source with the --enable-libdbi configure switch.
Checking hostname
  * OK: found hostname 'tsuga.innisfillibrary.ca' in <hosts> section of 
opensrf.xml

Towards the end there appears to be a problem:

/usr/local/lib/dbd/libdbdpgsql.so
  was not linked against libdbi - you probably need to compile 
libdbi-drivers from source with the --enable-libdbi configure switch.

Could that be causing problems?  If so, how do we rectify it?

-- 

Garry Dunn, P.Eng
Trellis Consulting
www.trellisconsulting.ca
705-835-5608 (Office)
905-302-7273 (Cell)


More information about the Open-ils-dev mailing list