[OPEN-ILS-DEV] Simple record materialized view - possible
improvements
Brandon W. Uhlman
brandon.uhlman at bclibrary.ca
Thu Aug 7 19:09:05 EDT 2008
Hi,
I'm going to try to talk my way through the current implementation of
materialized views in Evergreen, specifically the one for simple
record extracts that hangs off of metabib.full_rec, and then posit an
alternative. In my head, it should result in no change in general
performance, and some improvement in performance in some cases (the
marc2bre/direct_ingest/pg_loader dance).
Currently on an insert or update to metabib.full_rec,
zzz_update_materialized_simple_record_tgr is called, which
materializes the changes from the full record to the simple record
extracts. This makes importing a very large set or sets of bib data
incredibly slow.
The solution as discussed briefly on this list yesterday, and in more
detail last month
(http://list.georgialibraries.org/pipermail/open-ils-dev/2008-July/003265.html), is to use disable_materialized_simple_record_trigger() and enable_materialized_simple_record_trigger(),
which:
- remove the trigger
- truncate the materialized view table (that is, empty it), refresh
the data, and replace the trigger
respectively.
For continually growing datasets, this means, however, that the size
of the table to materialize upon reactivating the view is not
proportional to the size of the data set just added, but to the entire
dataset, including that which was previously materialized. For very
large datasets, the (computational) cost of rebuilding this data could
be significant; in addition, the functionality provided by this data
would not be available while you truncate the table (*).
My proposal is to make the materialization process a little lazier. Or
to quote Montgomery Scott, chief engineer of the Enterprise, to lock
the rematerialization subroutine directly into the transporter's
pattern buffer.
First, let's add a staleness bit/boolean to metabib.full_rec. The
staleness bit gets set to true if the row is updated or inserted when
changes to the simple record view are not being materialized -- that
is, because disable_materialized_simple_record_trigger() has been
called. On reenabling the trigger, instead of truncating the entire
table and rebuilding from scratch, only rebuild those entries which
have the staleness bit set.
If we're feeling particularly irreverent, we could call the column in
the database that says whether something is waiting to be
rematerialized 'in_pattern_buffer'.
Ok, so I just re-enabled the trigger on a dataset of about 920,000
records, and it only took 15 minutes. I imagined it would take longer.
Nonetheless, I am philosophically opposed to doing work I don't think
I have to, so I'm putting this idea out to gather moss.
Thoughts?
~B
(*) I guess it would if you wrapped the whole thing in one big
transaction, but the whole reason I'm thinking of this is that I
disabled the trigger outside a transaction, because I had multiple
imports to do. I'm not quite sure how transactions work in PostgreSQL.
Do I cost myself any performance by having a single "gigantically
enormous" transaction, instead of a bunch of "just pretty big" ones?
======================================
Brandon W. Uhlman, Systems Consultant
Public Library Services Branch
Ministry of Education
Government of British Columbia
605 Robson Street, 5th Floor
Vancouver, BC V6B 5J3
Phone: (604) 660-2972
E-mail: brandon.uhlman at gov.bc.ca
brandon.uhlman at bclibrary.ca
More information about the Open-ils-dev
mailing list