[OPEN-ILS-DEV] ***SPAM*** Introduction and possible optimization for database function

hubert depesz lubaczewski depesz at depesz.com
Wed Oct 2 14:58:14 EDT 2013


Hi,
my name is Hubert Lubaczewski (a.k.a. depesz). I'm PostgreSQL DBA from
Poland, that was hired to do review for Evergreen installations, and
suggest changes in database "things" (queries, structure).

For the client I analyzed logs from production Pg instance for ~ 10
days.

During this time, the single most time consuming query (summarized time
for all instances of the query, with different parameters) was:

SELECT * FROM unapi.bre ( ... ) AS "unapi.bre";

I profiled this function, and found that in my test case most of the
time (2.04s out of 2.06s, so ~ 99%) was spent in call to
unapi.holdings_xml() function.

When I profiled this function, I found that most of the time (sorry,
don't have the number now with me) was spent in call to
evergreen.ranked_volumes() function.

At this moment in my research something changed on the server I was
testing on, and all subsequent times were ~ 4-5 times lower, but the
ratios were more or less the same.

Anyway - call to evergreen.ranked_volumes() showed repeatable time (with
full caches/buffers) of ~ 380ms.

I modified the function by:
1. inlining actor.org_unit_descendants(?, ?)
2. inlining evergreen.rank_ou(?, ?, ?)
3. extracting depth calculation to separate call
4. switched to plpgsql (which gives me ability to use variables)
5. removed evergreen.rank_ou() and evergreen.rank_cp_status() from
   select clause - these are still in WINDOW definition, but they
   weren't used in the SELECT, so it's better to remove from there.
6. in passing renamed arguments to avoid name clash (argument depth vs.
   field depth)
7. in passing changed usage of $* to access parameters to using named
   parameters, for readability.

New function did the same work in ~ 18ms.

Now - after I finished my tests, I was unfortunately not able to repeat
slow performance of unapi.bre(), but I think that the optimization of
evergreen.ranked_volumes() is good on its own.

If you'll agree, I would appreciate modifying the function in project.
If I should do/provide something else, please let me know.

Best regards,

depesz

New function:

CREATE FUNCTION evergreen.ranked_volumes(
    p_bibid bigint,
    p_ouid integer,
    p_depth integer DEFAULT NULL::integer,
    p_slimit public.hstore DEFAULT NULL::public.hstore,
    p_soffset public.hstore DEFAULT NULL::public.hstore,
    p_pref_lib integer DEFAULT NULL::integer,
    p_includes text[] DEFAULT NULL::text[]
) RETURNS TABLE(
    id bigint,
    name text,
    label_sortkey text,
    rank bigint
)
LANGUAGE plpgsql
STABLE
AS $_$
DECLARE
    v_depth int4;
BEGIN
    v_depth := coalesce(
        p_depth,
        (
            SELECT depth
            FROM actor.org_unit_type aout
                INNER JOIN actor.org_unit ou ON ou_type = aout.id
            WHERE ou.id = p_ouid
        ),
        p_pref_lib
    );

    RETURN QUERY
    WITH RECURSIVE descendant_depth AS (
        SELECT  ou.id,
                ou.parent_ou,
                out.depth
        FROM  actor.org_unit ou
                JOIN actor.org_unit_type out ON (out.id = ou.ou_type)
                JOIN anscestor_depth ad ON (ad.id = ou.id)
        WHERE ad.depth = v_depth
            UNION ALL
        SELECT  ou.id,
                ou.parent_ou,
                out.depth
        FROM  actor.org_unit ou
                JOIN actor.org_unit_type out ON (out.id = ou.ou_type)
                JOIN descendant_depth ot ON (ot.id = ou.parent_ou)
    ), anscestor_depth AS (
        SELECT  ou.id,
                ou.parent_ou,
                out.depth
        FROM  actor.org_unit ou
                JOIN actor.org_unit_type out ON (out.id = ou.ou_type)
        WHERE ou.id = p_ouid
            UNION ALL
        SELECT  ou.id,
                ou.parent_ou,
                out.depth
        FROM  actor.org_unit ou
                JOIN actor.org_unit_type out ON (out.id = ou.ou_type)
                JOIN anscestor_depth ot ON (ot.parent_ou = ou.id)
    ), descendants as (
        SELECT ou.* FROM actor.org_unit ou JOIN descendant_depth USING (id)
    )

    SELECT ua.id, ua.name, ua.label_sortkey, MIN(ua.rank) AS rank FROM (
        SELECT acn.id, aou.name, acn.label_sortkey,
            RANK() OVER w
        FROM asset.call_number acn
            JOIN asset.copy acp ON (acn.id = acp.call_number)
            JOIN descendants AS aou ON (acp.circ_lib = aou.id)
        WHERE acn.record = p_bibid
            AND acn.deleted IS FALSE
            AND acp.deleted IS FALSE
            AND CASE WHEN ('exclude_invisible_acn' = ANY(p_includes)) THEN 
                EXISTS (
                    SELECT 1 
                    FROM asset.opac_visible_copies 
                    WHERE copy_id = acp.id AND record = acn.record
                ) ELSE TRUE END
        GROUP BY acn.id, acp.status, aou.name, acn.label_sortkey, aou.id
        WINDOW w AS (
            ORDER BY 
                COALESCE(
                    CASE WHEN aou.id = p_ouid THEN -20000 END,
                    CASE WHEN aou.id = p_pref_lib THEN -10000 END,
                    (SELECT distance - 5000
                        FROM actor.org_unit_descendants_distance(p_pref_lib) as x
                        WHERE x.id = aou.id AND p_pref_lib IN (
                            SELECT q.id FROM actor.org_unit_descendants(p_ouid) as q)),
                    (SELECT e.distance FROM actor.org_unit_descendants_distance(p_ouid) as e WHERE e.id = aou.id),
                    1000
                ),
                evergreen.rank_cp_status(acp.status)
        )
    ) AS ua
    GROUP BY ua.id, ua.name, ua.label_sortkey
    ORDER BY rank, ua.name, ua.label_sortkey
    LIMIT (p_slimit -> 'acn')::INT
    OFFSET (p_soffset -> 'acn')::INT;
END;
$_$;


-- 
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/


More information about the Open-ils-dev mailing list