[OPEN-ILS-DEV] Optimization for triggers?

Thu Oct 10 17:19:22 EDT 2013

IMO, the 150ms first-call savings is probably not worth it.  Ingesting
a bib record is expected to take anywhere from 0.1 to 1.0 seconds, and
is (in the grand scheme of queries, especially by count) relatively
infrequent.

However, you noted the problem trigger as

    bbb_simple_rec_trigger AFTER INSERT OR DELETE OR UPDATE ON
biblio.record_entry FOR EACH ROW EXECUTE PROCEDURE
reporter.simple_rec_trigger()

which is implemented by (after a round of indirection) this function:

CREATE OR REPLACE FUNCTION reporter.simple_rec_update (r_id BIGINT,
deleted BOOL) RETURNS BOOL AS $$
BEGIN

    DELETE FROM reporter.materialized_simple_record WHERE id = r_id;

    IF NOT deleted THEN
        INSERT INTO reporter.materialized_simple_record SELECT
DISTINCT ON (id) * FROM reporter.old_super_simple_record WHERE id =
r_id;
    END IF;

    RETURN TRUE;

END;
$$ LANGUAGE PLPGSQL;

This maintains a materialized view based on the
reporter.old_super_simple_record view, which is ... fairly gnarly. I
would be inclined to believe some blame belongs to that view for
intermittent speed regressions, depending on the record in question.

Or, perhaps it was a cut-and-paste mistake and the plperl trigger is
really the primary source of the slowdown.  If so, please ignore me.
:)

In either case, thanks again for the input.  We appreciate all the help!

--miker

On Thu, Oct 10, 2013 at 4:56 PM, hubert depesz lubaczewski
<depesz at depesz.com> wrote:
> Hi,
> it's me again.
>
> In our logs, #4 most time consumming query, with 32120 cases logged was:
> UPDATE
>     biblio.record_entry
> SET
>     active = '',
>     create_date = '',
>     creator = 0,
>     deleted = '',
>     edit_date = '',
>     editor = 0,
>     fingerprint = '',
>     last_xact_id = '',
>     marc = '',
>     quality = 0,
>     source = 0,
>     tcn_source = '',
>     tcn_value = '',
>     OWNER = NULL,
>     share_depth = NULL
> WHERE
>     id = 0;
>
> Of course instead of '' and 0 - there were some other values.
>
> I checked, and this update is fast. Usually.
>
> Sometimes though, it gets much slower. I tracked the problem to slow
> trigger:
> bbb_simple_rec_trigger AFTER INSERT OR DELETE OR UPDATE ON biblio.record_entry FOR EACH ROW EXECUTE PROCEDURE reporter.simple_rec_trigger()
>
> This trigger is implemented using this function:
>
> #v+
> CREATE OR REPLACE FUNCTION evergreen.maintain_901()
> RETURNS trigger
> LANGUAGE plperlu AS $function$
> use strict;
> use MARC::Record;
> use MARC::File::XML (BinaryEncoding => 'UTF-8');
> use MARC::Charset;
> use Encode;
> use Unicode::Normalize;
>
> MARC::Charset->assume_unicode(1);
> ...
> $function$;
> #v-
>
> It's some kind of plperlu function which does something with unicode.
> What is interesting are the "use ..." lines there.
>
> Let me show you very simplistic example:
>
> $ \sf public.depesz_test
> CREATE OR REPLACE FUNCTION public.depesz_test()
>  RETURNS void
>  LANGUAGE plperlu
> AS $function$
> use strict;
> use MARC::Record;
> use MARC::File::XML (BinaryEncoding => 'UTF-8');
> use MARC::Charset;
> use Encode;
> use Unicode::Normalize;
> MARC::Charset->assume_unicode(1);
> elog(WARNING, 'OK');
> $function$
>
> $ select public.depesz_test();
> WARNING:  OK
> CONTEXT:  PL/Perl function "depesz_test"
>  depesz_test
> -------------
>  [null]
> (1 row)
>
> Time: 306.269 ms
>
> $ select public.depesz_test();
> WARNING:  OK
> CONTEXT:  PL/Perl function "depesz_test"
>  depesz_test
> -------------
>  [null]
> (1 row)
>
> Time: 141.922 ms
>
> It is fully repeatable - first run of the function takes > 300ms, 2nd (and all
> subsequent) take ~ 140ms (which is my connection speed to the server).
>
> The problem is that on first compilation for given pg backend, the
> function has to be compiled, and (what's especially expensive) all the
> "use" lines have to be executed, including location of files, loading
> them, and running to load appropriate modules.
>
> This is generally not a big problem, but when machine is loaded, and
> there are many connections - you're doing the same work over and over
> again.
>
> Solutions. There are (at least) three:
> 0. Ignore. On the assumption that - it doesn't matter all that much.
> 1. Add pgBouncer (connection pooling system). Even in least useful mode
>    (session pooling) it will decrease rotation of backends, hence making
>    the initialization much less common.
> 2. Change postgresql.conf and add there:
> shared_preload_libraries = 'plperl'
> and
> plperl.on_init = '...';
> (in both of the cases it should be plperl, and not plperlu!).
>
> As for value in on_init - I checked the system for all triggers that are
> in pl/perl* and it looks like full list of used modules is:
>
> use Encode;
> use JSON::XS;
> use MARC::Charset;
> use MARC::Field;
> use MARC::File::XML (BinaryEncoding => 'UTF-8');
> use MARC::Record;
> use strict;
> use Unicode::Normalize;
>
> So that's what I would suggest to put there.
> Doing this in this way (shared_preload_libraries + plperl.in_init) would
> cause Pg to load plperl on server start and immediately (before any
> connections are handled) load the on_init code - which should contain
> all code that loads modules, and possibly some common initializations
> ( like "MARC::Charset->assume_unicode(1);" ).
>
> There are two drawbacks to this idea, though:
> a. changing these will require Pg restart (short, but still).
> b. if you'd ever want to upgrade version of the modules that are listed
> in on_init - you'd have to (after new versions are installed) - restart
> Pg - as it would have previous, old, versions loaded to memory.
>
> There are also additional benefits - the code would be loaded just once,
> and then it would be shared using "copy on write" semantics to all
> backends, thus making the system use a bit less RAM.
>
> As for whether it's worth it - in my report, which scanned logs of 10
> days - total duration of the queries was 4.5 hours. Not *that* much, but
> perhaps it's worth it anyway? This would also benefit any other
> functions that use these (and other) modules.
>
> I talked with the client, and while the optimization seems to be to
> change postgresql.conf - which, in theory, is different in every
> installation, they suggested that I send this information to list for
> peer review, and perhaps it would help others (or I could get bashed for
> not understanding something).
>
> Hope you'll find some use for this.
>
> Best regards,
>
> depesz
>

-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com