[OPEN-ILS-DEV] Optimization for triggers?

Thu Oct 10 16:56:58 EDT 2013

Hi,
it's me again.

In our logs, #4 most time consumming query, with 32120 cases logged was:
UPDATE
    biblio.record_entry
SET
    active = '',
    create_date = '',
    creator = 0,
    deleted = '',
    edit_date = '',
    editor = 0,
    fingerprint = '',
    last_xact_id = '',
    marc = '',
    quality = 0,
    source = 0,
    tcn_source = '',
    tcn_value = '',
    OWNER = NULL,
    share_depth = NULL
WHERE
    id = 0;

Of course instead of '' and 0 - there were some other values.

I checked, and this update is fast. Usually.

Sometimes though, it gets much slower. I tracked the problem to slow
trigger:
bbb_simple_rec_trigger AFTER INSERT OR DELETE OR UPDATE ON biblio.record_entry FOR EACH ROW EXECUTE PROCEDURE reporter.simple_rec_trigger()

This trigger is implemented using this function:

#v+
CREATE OR REPLACE FUNCTION evergreen.maintain_901()
RETURNS trigger
LANGUAGE plperlu AS $function$
use strict;
use MARC::Record;
use MARC::File::XML (BinaryEncoding => 'UTF-8');
use MARC::Charset;
use Encode;
use Unicode::Normalize;

MARC::Charset->assume_unicode(1);
...
$function$;
#v-

It's some kind of plperlu function which does something with unicode.
What is interesting are the "use ..." lines there.

Let me show you very simplistic example:

$ \sf public.depesz_test 
CREATE OR REPLACE FUNCTION public.depesz_test()
 RETURNS void
 LANGUAGE plperlu
AS $function$
use strict;
use MARC::Record;
use MARC::File::XML (BinaryEncoding => 'UTF-8');
use MARC::Charset;
use Encode;
use Unicode::Normalize;
MARC::Charset->assume_unicode(1);
elog(WARNING, 'OK');
$function$

$ select public.depesz_test();
WARNING:  OK
CONTEXT:  PL/Perl function "depesz_test"
 depesz_test 
-------------
 [null]
(1 row)

Time: 306.269 ms

$ select public.depesz_test();
WARNING:  OK
CONTEXT:  PL/Perl function "depesz_test"
 depesz_test 
-------------
 [null]
(1 row)

Time: 141.922 ms

It is fully repeatable - first run of the function takes > 300ms, 2nd (and all
subsequent) take ~ 140ms (which is my connection speed to the server).

The problem is that on first compilation for given pg backend, the
function has to be compiled, and (what's especially expensive) all the
"use" lines have to be executed, including location of files, loading
them, and running to load appropriate modules.

This is generally not a big problem, but when machine is loaded, and
there are many connections - you're doing the same work over and over
again.

Solutions. There are (at least) three:
0. Ignore. On the assumption that - it doesn't matter all that much.
1. Add pgBouncer (connection pooling system). Even in least useful mode
   (session pooling) it will decrease rotation of backends, hence making
   the initialization much less common.
2. Change postgresql.conf and add there:
shared_preload_libraries = 'plperl'
and
plperl.on_init = '...';
(in both of the cases it should be plperl, and not plperlu!).

As for value in on_init - I checked the system for all triggers that are
in pl/perl* and it looks like full list of used modules is:

use Encode;
use JSON::XS;
use MARC::Charset;
use MARC::Field;
use MARC::File::XML (BinaryEncoding => 'UTF-8');
use MARC::Record;
use strict;
use Unicode::Normalize;

So that's what I would suggest to put there.
Doing this in this way (shared_preload_libraries + plperl.in_init) would
cause Pg to load plperl on server start and immediately (before any
connections are handled) load the on_init code - which should contain
all code that loads modules, and possibly some common initializations
( like "MARC::Charset->assume_unicode(1);" ).

There are two drawbacks to this idea, though:
a. changing these will require Pg restart (short, but still).
b. if you'd ever want to upgrade version of the modules that are listed
in on_init - you'd have to (after new versions are installed) - restart
Pg - as it would have previous, old, versions loaded to memory.

There are also additional benefits - the code would be loaded just once,
and then it would be shared using "copy on write" semantics to all
backends, thus making the system use a bit less RAM.

As for whether it's worth it - in my report, which scanned logs of 10
days - total duration of the queries was 4.5 hours. Not *that* much, but
perhaps it's worth it anyway? This would also benefit any other
functions that use these (and other) modules.

I talked with the client, and while the optimization seems to be to
change postgresql.conf - which, in theory, is different in every
installation, they suggested that I send this information to list for
peer review, and perhaps it would help others (or I could get bashed for
not understanding something).

Hope you'll find some use for this.

Best regards,

depesz