[OPEN-ILS-DEV] Another possible optimization for database access/query
hubert depesz lubaczewski
depesz at depesz.com
Mon Oct 7 11:52:30 EDT 2013
Hi,
Couple of days ago, I started to work on #2 most time consuming query
(from pg logs of our client):
SELECT "acn".record AS "record", max ("acp".create_date) AS "create_date"
FROM asset.call_number AS "acn"
INNER JOIN asset.COPY AS "acp" ON ("acp".call_number = "acn".id)
WHERE ("acn".record > 0)
AND ("acp".deleted = 'f'
AND "acp".circ_lib IN ('279',
'280',
'280',
'281',
'281',
'282',
'282',
'283',
'283',
'284',
'284',
'285',
'285',
'286',
'286',
'287',
'287',
'288',
'288',
'289',
'289'))
GROUP BY 1
ORDER BY max ("acp".create_date) DESC LIMIT 20
;
This query has explain analyze http://explain.depesz.com/s/RAMc
- runtime of 13.14 seconds.
First level of optimization was adding 2 indexes:
create index copy_circ_lib on asset.copy (circ_lib);
create index unit_circ_lib on serial.unit (circ_lib);
This made the query faster - 8.719s -- http://explain.depesz.com/s/UOy
Afterwards, I added another two indexes, in hopes that it will get used:
create index copy_circ_lib_create_date on asset.copy (circ_lib, create_date);
create index unit_circ_lib_create_date on serial.unit (circ_lib, create_date);
but it wasn't used.
Since I didn't see a way to optimize the query further by changing query
or adding indexes, decided to write a function instead.
First iteration of the function:
CREATE OR REPLACE FUNCTION asset.some_clever_name( IN p_limit INT4, VARIADIC p_circ_libs INT4[] )
RETURNS TABLE ( record INT8, create_date timestamptz ) as $$
DECLARE
v_results public.hstore := '';
v_record int8;
v_temprec record;
v_found INT4 := 0;
BEGIN
FOR v_temprec IN
SELECT
c.call_number,
max(c.create_date) as create_date
FROM
asset.copy c
WHERE c.circ_lib = any( p_circ_libs )
AND NOT c.deleted
GROUP BY c.call_number
ORDER BY max(c.create_date) DESC
LOOP
SELECT cn.record INTO v_record FROM asset.call_number cn WHERE cn.id = v_temprec.call_number;
CONTINUE WHEN NOT FOUND;
CONTINUE WHEN v_record <= 0;
CONTINUE WHEN v_results ? v_record::TEXT;
v_found := v_found + 1;
v_results := v_results || hstore( v_record::TEXT, v_temprec.create_date::TEXT );
EXIT WHEN v_found = p_limit;
END LOOP;
RETURN QUERY SELECT KEY::INT8, value::timestamptz FROM each(v_results) ORDER BY value::timestamptz DESC;
END;
$$ language plpgsql STRICT;
It can be called like this:
SELECT * FROM asset.some_clever_name( 20, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289 );
(i'm not good with naming stuff)
result - 1.389s -- http://explain.depesz.com/s/IIu
much better, but it's using the less optimal indexes (first set of two
indexes I created).
So I wrote this - second iteration of functional approach:
CREATE OR REPLACE FUNCTION asset.some_clever_name( IN p_limit INT4, VARIADIC p_circ_libs INT4[] )
RETURNS TABLE ( record INT8, create_date timestamptz ) as $$
DECLARE
v_results public.hstore := '';
v_seen public.hstore := '';
v_records public.hstore := '';
v_oldest timestamptz := NULL;
v_c_oldest timestamptz := NULL;
v_found INT4 := 0;
v_circ_lib INT4;
v_record int8;
v_temprec record;
v_iter INT4;
v_cursor REFCURSOR;
BEGIN
FOREACH v_circ_lib IN ARRAY p_circ_libs LOOP
v_iter := 0;
v_seen := '';
v_c_oldest := NULL;
open v_cursor NO SCROLL FOR
SELECT c.call_number, c.create_date
FROM asset.copy c
WHERE c.circ_lib = v_circ_lib AND NOT c.deleted
ORDER BY c.create_date DESC;
LOOP
FETCH v_cursor INTO v_temprec;
EXIT WHEN NOT FOUND;
v_iter := v_iter + 1;
-- If we already have better data than current row (newer records in
EXIT WHEN v_oldest IS NOT NULL AND v_oldest >= v_temprec.create_date;
-- Ignore if we've seen given call number in current query (for current circ_lib)
CONTINUE WHEN v_seen ? v_temprec.call_number::TEXT;
v_seen := v_seen || public.hstore( v_temprec.call_number::TEXT, '1' );
-- If we don't have yet record for given call_number, we need to get it
IF v_records ? v_temprec.call_number::TEXT THEN
v_record := v_records -> v_temprec.call_number::TEXT;
ELSE
SELECT cn.record INTO v_record FROM asset.call_number cn WHERE cn.id = v_temprec.call_number;
CONTINUE WHEN NOT FOUND;
CONTINUE WHEN v_record <= 0;
v_records := v_records || hstore( v_temprec.call_number::TEXT, v_record::TEXT );
END IF;
-- If results already contain "better" date for given record, next row
IF v_results ? v_record::TEXT THEN
CONTINUE WHEN ( v_results -> v_record::TEXT )::timestamptz > v_temprec.create_date;
END IF;
v_found := v_found + 1;
v_results := v_results || hstore( v_record::TEXT, v_temprec.create_date::TEXT );
IF v_c_oldest IS NULL OR v_c_oldest > v_temprec.create_date THEN
v_c_oldest := v_temprec.create_date;
END IF;
EXIT WHEN v_found = p_limit;
END LOOP;
CLOSE v_cursor;
-- Update oldest information based on oldest row added in current loop
IF v_oldest IS NULL OR v_oldest < v_c_oldest THEN
v_oldest := v_c_oldest;
END IF;
END LOOP;
RETURN QUERY SELECT KEY::INT8, value::timestamptz FROM each(v_results) ORDER BY value::timestamptz DESC LIMIT p_limit;
RETURN;
END;
$$ language plpgsql STRICT;
It can be used in exactly the same way. And it returns, of course, the
same results.
But the time - well - it's 2.687 *ms* http://explain.depesz.com/s/NBwP
This means that the 2nd function, for these database, with these
parameters is almost 5000 times faster.
The drawback is that it's not really nice function, as it uses (for
performance) certain side-effects of plpgsql constructs (cursors).
Contacted the client, and, as previously, it was authorized to get
released to you guys, in hopes that you can make a decision whether the
optimization is worth being included in core.
Best regards,
depesz
--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/
More information about the Open-ils-dev
mailing list