[OPEN-ILS-DEV] Simple record materialized view - possible improvements

Thu Aug 7 19:09:05 EDT 2008

Hi,

I'm going to try to talk my way through the current implementation of  
materialized views in Evergreen, specifically the one for simple  
record extracts that hangs off of metabib.full_rec, and then posit an  
alternative. In my head, it should result in no change in general  
performance, and some improvement in performance in some cases (the  
marc2bre/direct_ingest/pg_loader dance).

Currently on an insert or update to metabib.full_rec,  
zzz_update_materialized_simple_record_tgr is called, which  
materializes the changes from the full record to the simple record  
extracts. This makes importing a very large set or sets of bib data  
incredibly slow.

The solution as discussed briefly on this list yesterday, and in more  
detail last month  
(http://list.georgialibraries.org/pipermail/open-ils-dev/2008-July/003265.html), is to use disable_materialized_simple_record_trigger() and enable_materialized_simple_record_trigger(),  
which:

- remove the trigger
- truncate the materialized view table (that is, empty it), refresh  
the data, and replace the trigger

respectively.

For continually growing datasets, this means, however, that the size  
of the table to materialize upon reactivating the view is not  
proportional to the size of the data set just added, but to the entire  
dataset, including that which was previously materialized. For very  
large datasets, the (computational) cost of rebuilding this data could  
be significant; in addition, the functionality provided by this data  
would not be available while you truncate the table (*).

My proposal is to make the materialization process a little lazier. Or  
to quote Montgomery Scott, chief engineer of the Enterprise, to lock  
the rematerialization subroutine directly into the transporter's  
pattern buffer.

First, let's add a staleness bit/boolean to metabib.full_rec. The  
staleness bit gets set to true if the row is updated or inserted when  
changes to the simple record view are not being materialized -- that  
is, because disable_materialized_simple_record_trigger() has been  
called. On reenabling the trigger, instead of truncating the entire  
table and rebuilding from scratch, only rebuild those entries which  
have the staleness bit set.

If we're feeling particularly irreverent, we could call the column in  
the database that says whether something is waiting to be  
rematerialized 'in_pattern_buffer'.

Ok, so I just re-enabled the trigger on a dataset of about 920,000  
records, and it only took 15 minutes. I imagined it would take longer.  
Nonetheless, I am philosophically opposed to doing work I don't think  
I have to, so I'm putting this idea out to gather moss.

Thoughts?

~B

(*) I guess it would if you wrapped the whole thing in one big  
transaction, but the whole reason I'm thinking of this is that I  
disabled the trigger outside a transaction, because I had multiple  
imports to do. I'm not quite sure how transactions work in PostgreSQL.  
Do I cost myself any performance by having a single "gigantically  
enormous" transaction, instead of a bunch of "just pretty big" ones?

======================================
Brandon W. Uhlman, Systems Consultant
Public Library Services Branch
Ministry of Education
Government of British Columbia
605 Robson Street, 5th Floor
Vancouver, BC  V6B 5J3

Phone: (604) 660-2972
E-mail: brandon.uhlman at gov.bc.ca
         brandon.uhlman at bclibrary.ca