[OPEN-ILS-DEV] ***SPAM*** ***SPAM*** Deindexing deleted bibs; superpage settings
Brandon W. Uhlman
brandon at branflakes.net
Tue Mar 16 14:32:00 EDT 2010
Hi, all.
The Evergreen system I work on is seeing some odd search behaviour
which we've determined is the result of recent work merging our
bibliographic records -- full sets imported from each of our member
libraries to date -- first against a set of 'authoritative' records,
and then against each other.
The result is that there are a large number of deleted bib records.
This means a majority of our indexed terms are attached to deleted
records. For our subject headings, for example:
evergreen=# SELECT count(msfe.id), bre.deleted FROM
metabib.subject_field_entry msfe JOIN biblio.record_entry bre ON
(msfe.source = bre.id) GROUP BY deleted;
count | deleted
---------+---------
2576458 | t
1517063 | f
(2 rows)
This results in superpages that are very sparsely populated with
active nodes, especially for popular search terms that are limited by
org unit or shelving location, et ceterea. From the end user
perspective, this manifests itself sometimes as returning inaccurate
result counts, or counts which vary wildly depending on the sort
order, and sometimes returns zero results when in fact results exist
because the result would have been on a later superpage then the
search limits allowed.
Increasing the superpage size from the default size of 10 pages of
1000 records each to 50 pages of 1000 records each helped somewhat,
but we still see odd results sometimes.
So I have several questions that come from this experience:
- is there a best practice for determining how big your superpages
should be, and how many you should have?
- is cleaning up indexing for these deleted bib records as easy as
'DELETE FROM metabib.*_field_entry WHERE source IN (SELECT id FROM
biblio.record_entry WHERE deleted = true)'
- what do folks think of delete indexing information for bibs at the
time of 'deletion', and to make re-ingesting part of the undeletion
process -- since deleted records by design can't be found by
indexed-field searches, anyway, IIRC. If there's consensus that this
is a good idea, I can submit a patch to that effect.
I realize this is a bit of an edge case, but I'm hoping folks still
have some comments.
Thanks,
~B
=============================================
Brandon W. Uhlman
President, Lillooet-Camelsfoot TV and Radio Association
Lillooet, BC V0K 1V0
brandon at branflakes.net
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
More information about the Open-ils-dev
mailing list