[OPEN-ILS-DEV] Another possible optimization for database access/query

Mike Rylander mrylander at gmail.com
Mon Oct 7 13:03:37 EDT 2013


depesz, that is awesome.  As soon as I'm in a position to (probably
tomorrow) I'll look at integrating this, as well as the change from
your previous email, for broader testing.

For those following along but unfamiliar with the context of the query
that depesz optimized here, this is the "new item feed", where we get
a list of bibs that have newly attached items.

Thanks again, depesz.

--miker


On Mon, Oct 7, 2013 at 11:52 AM, hubert depesz lubaczewski
<depesz at depesz.com> wrote:
> Hi,
> Couple of days ago, I started to work on #2 most time consuming query
> (from pg logs of our client):
>
> SELECT "acn".record AS "record", max ("acp".create_date) AS "create_date"
> FROM asset.call_number AS "acn"
> INNER JOIN asset.COPY AS "acp" ON ("acp".call_number = "acn".id)
> WHERE ("acn".record > 0)
>   AND ("acp".deleted = 'f'
>        AND "acp".circ_lib IN ('279',
>                               '280',
>                               '280',
>                               '281',
>                               '281',
>                               '282',
>                               '282',
>                               '283',
>                               '283',
>                               '284',
>                               '284',
>                               '285',
>                               '285',
>                               '286',
>                               '286',
>                               '287',
>                               '287',
>                               '288',
>                               '288',
>                               '289',
>                               '289'))
> GROUP BY 1
> ORDER BY max ("acp".create_date) DESC LIMIT 20
> ;
>
> This query has explain analyze http://explain.depesz.com/s/RAMc
> - runtime of 13.14 seconds.
>
>
> First level of optimization was adding 2 indexes:
>
> create index copy_circ_lib on asset.copy (circ_lib);
> create index unit_circ_lib on serial.unit (circ_lib);
>
> This made the query faster - 8.719s -- http://explain.depesz.com/s/UOy
>
> Afterwards, I added another two indexes, in hopes that it will get used:
>
> create index copy_circ_lib_create_date on asset.copy (circ_lib, create_date);
> create index unit_circ_lib_create_date on serial.unit (circ_lib, create_date);
>
> but it wasn't used.
>
> Since I didn't see a way to optimize the query further by changing query
> or adding indexes, decided to write a function instead.
>
> First iteration of the function:
>
> CREATE OR REPLACE FUNCTION asset.some_clever_name( IN p_limit INT4, VARIADIC p_circ_libs INT4[] )
>     RETURNS TABLE ( record INT8, create_date timestamptz ) as $$
> DECLARE
>     v_results public.hstore := '';
>     v_record  int8;
>     v_temprec record;
>     v_found   INT4 := 0;
> BEGIN
>     FOR v_temprec IN
>         SELECT
>             c.call_number,
>             max(c.create_date) as create_date
>         FROM
>             asset.copy c
>         WHERE c.circ_lib = any( p_circ_libs )
>             AND NOT c.deleted
>         GROUP BY c.call_number
>         ORDER BY max(c.create_date) DESC
>     LOOP
>         SELECT cn.record INTO v_record FROM asset.call_number cn WHERE cn.id = v_temprec.call_number;
>         CONTINUE WHEN NOT FOUND;
>         CONTINUE WHEN v_record <= 0;
>         CONTINUE WHEN v_results ? v_record::TEXT;
>         v_found := v_found + 1;
>         v_results  := v_results || hstore( v_record::TEXT, v_temprec.create_date::TEXT );
>         EXIT WHEN v_found = p_limit;
>     END LOOP;
>     RETURN QUERY SELECT KEY::INT8, value::timestamptz FROM each(v_results) ORDER BY value::timestamptz DESC;
> END;
> $$ language plpgsql STRICT;
>
> It can be called like this:
> SELECT * FROM asset.some_clever_name( 20, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289 );
>
> (i'm not good with naming stuff)
> result - 1.389s -- http://explain.depesz.com/s/IIu
> much better, but it's using the less optimal indexes (first set of two
> indexes I created).
>
> So I wrote this - second iteration of functional approach:
>
> CREATE OR REPLACE FUNCTION asset.some_clever_name( IN p_limit INT4, VARIADIC p_circ_libs INT4[] )
>     RETURNS TABLE ( record INT8, create_date timestamptz ) as $$
> DECLARE
>     v_results  public.hstore := '';
>     v_seen     public.hstore := '';
>     v_records  public.hstore := '';
>     v_oldest   timestamptz   := NULL;
>     v_c_oldest timestamptz   := NULL;
>     v_found    INT4 := 0;
>     v_circ_lib INT4;
>     v_record   int8;
>     v_temprec  record;
>     v_iter     INT4;
>     v_cursor   REFCURSOR;
> BEGIN
>     FOREACH v_circ_lib IN ARRAY p_circ_libs LOOP
>         v_iter := 0;
>         v_seen := '';
>         v_c_oldest := NULL;
>         open v_cursor NO SCROLL FOR
>             SELECT c.call_number, c.create_date
>             FROM asset.copy c
>             WHERE c.circ_lib = v_circ_lib AND NOT c.deleted
>             ORDER BY c.create_date DESC;
>
>         LOOP
>             FETCH v_cursor INTO v_temprec;
>             EXIT WHEN NOT FOUND;
>
>             v_iter := v_iter + 1;
>
>             -- If we already have better data than current row (newer records in
>             EXIT WHEN v_oldest IS NOT NULL AND v_oldest >= v_temprec.create_date;
>
>             -- Ignore if we've seen given call number in current query (for current circ_lib)
>             CONTINUE WHEN v_seen ? v_temprec.call_number::TEXT;
>             v_seen := v_seen || public.hstore( v_temprec.call_number::TEXT, '1' );
>
>             -- If we don't have yet record for given call_number, we need to get it
>             IF v_records ? v_temprec.call_number::TEXT THEN
>                 v_record := v_records -> v_temprec.call_number::TEXT;
>             ELSE
>                 SELECT cn.record INTO v_record FROM asset.call_number cn WHERE cn.id = v_temprec.call_number;
>                 CONTINUE WHEN NOT FOUND;
>                 CONTINUE WHEN v_record <= 0;
>                 v_records := v_records || hstore( v_temprec.call_number::TEXT, v_record::TEXT );
>             END IF;
>
>             -- If results already contain "better" date for given record, next row
>             IF v_results ? v_record::TEXT THEN
>                 CONTINUE WHEN ( v_results -> v_record::TEXT )::timestamptz > v_temprec.create_date;
>             END IF;
>
>             v_found := v_found + 1;
>             v_results  := v_results || hstore( v_record::TEXT, v_temprec.create_date::TEXT );
>
>             IF v_c_oldest IS NULL OR v_c_oldest > v_temprec.create_date THEN
>                 v_c_oldest := v_temprec.create_date;
>             END IF;
>
>             EXIT WHEN v_found = p_limit;
>         END LOOP;
>
>         CLOSE v_cursor;
>
>         -- Update oldest information based on oldest row added in current loop
>         IF v_oldest IS NULL OR v_oldest < v_c_oldest THEN
>             v_oldest := v_c_oldest;
>         END IF;
>
>     END LOOP;
>     RETURN QUERY SELECT KEY::INT8, value::timestamptz FROM each(v_results) ORDER BY value::timestamptz DESC LIMIT p_limit;
> RETURN;
> END;
> $$ language plpgsql STRICT;
>
> It can be used in exactly the same way. And it returns, of course, the
> same results.
>
> But the time - well - it's 2.687 *ms* http://explain.depesz.com/s/NBwP
>
> This means that the 2nd function, for these database, with these
> parameters is almost 5000 times faster.
>
> The drawback is that it's not really nice function, as it uses (for
> performance) certain side-effects of plpgsql constructs (cursors).
>
> Contacted the client, and, as previously, it was authorized to get
> released to you guys, in hopes that you can make a decision whether the
> optimization is worth being included in core.
>
> Best regards,
>
> depesz
>
> --
> The best thing about modern society is how easy it is to avoid contact with it.
>                                                              http://depesz.com/



-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list