[OPEN-ILS-DEV] Postgres speed issues

Wed Jul 2 08:59:14 EDT 2008

Hello all,

Thanks to Reuben and Todd for the suggestions.  I am somewhat embarrassed to say that it turned out that a thorough vacuuming/analyzation was all that was needed.  I had tried vacuuming tables which I felt were relevant to searching, but I certainly had not done them all, and perhaps the analyze part may have been the most important anyway.  Either way, I vote that a vacuum/analyze final step be added to the import wiki page, as this has proven to be absolutely necessary for a load of this size.

The full vacuum did reveal another oddity, so here is some hopefully interesting and useful information.  The table which ended up needing the most cleaning out turned out to be reporter.materialized_simple_record.  The reason this happened is that there is a trigger on metabib.full_rec which syncs this reporter table on every update of full_rec.  This of course keeps the data coherent under normal circumstances, but is a lot of extra (and I think unnecessary) work when doing a basic bulk load.  The end result, in my case, was a table with 100,000 good rows (one for each record) and 5,825,699 dead rows (one for every single field in my dataset, when adding the 100,000 back in).  This also chewed up around 2GB of disk space (which mattered to me but maybe not to many).

Here is my solution to this issue:

Before running pg_loader-output.sql, run the following SQL statement:

ALTER TABLE metabib.full_rec DISABLE TRIGGER zzz_update_materialized_simple_record_tgr;

After running pg_loader-output.sql, run the following SQL statements:

INSERT INTO reporter.materialized_simple_record SELECT DISTINCT ON (id) * FROM reporter.old_super_simple_record WHERE id >=1;
ALTER TABLE metabib.full_rec ENABLE TRIGGER zzz_update_materialized_simple_record_tgr;

Doing this not only saved me from needing to reclaim the 2GB of space, but it also reduced significantly the time needed to run pg_loader-output.sql.  I know there are currently other much slower steps in the load process, but every bit of time savings helps.  These steps assume you are loading into a completely empty database, but I think they would work with live data provided you emptied reporter.materialized_simple_record before filling it up again.

What do others think?

Thanks,
DW

>>> "Reuben Pasquini" <rdp0004 at auburn.edu> 6/30/2008 11:28 AM >>>
Try vacuuming the database.
     $ vacuumdb -U postgres --all --full --analyze

http://www.postgresql.org/docs/8.2/static/maintenance.html 

You may also want to increase the default number
of shared-memory buffers postgres can use to
stash data in memory.
    http://www.postgresql.org/docs/8.2/static/runtime-config.html 

Good luck!
Reuben

>>> "Dan Wells" <dbw2 at calvin.edu> 6/30/2008 9:40 AM >>>
Hello all,

I have been spending the last few weeks trying out different methods
for loading a batch of around 100,000 records.  I have had varying
degrees of success with getting the load to "work," including copy level
information, but the last few loads seem to have somehow developed a
search speed problem.  I say "developed" because I am pretty sure that
it wasn't horribly slow the first few times I did the load, but has
gotten progressively worse to the point that any keyword search which
returns several thousand results no longer completes in the OPAC, and
takes from 45 seconds to several minutes to process in postgres.  I am
not certain that it has gotten worse, but I am sure that it is currently
quite bad.

In between each load I have been running build-db.sh, and even dropped
and recreated my entire evergreen database before a recent load.  Is
there something else I need to do in order to get an entirely clean
slate?  This project has been my first experience with postgres, so I
may be missing something rather obvious.

My server specs are somewhat modest but by no means slow; I am
currently running a RedHat ES 5 virtual machine on an Opteron 2.4Ghz
with 1GB of dedicated RAM.

Any suggestions?  Naturally one might suspect an index problem.  Is
there any way to verify that all relevant indexes are working, and/or to
simply rebuild them all?

Thanks,
DW