[OPEN-ILS-DEV] ***SPAM*** ***SPAM*** Deindexing deleted bibs; superpage settings

Brandon W. Uhlman brandon at branflakes.net
Tue Mar 16 14:32:00 EDT 2010


Hi, all.

The Evergreen system I work on is seeing some odd search behaviour  
which we've determined is the result of recent work merging our  
bibliographic records -- full sets imported from each of our member  
libraries to date -- first against a set of 'authoritative' records,  
and then against each other.

The result is that there are a large number of deleted bib records.  
This means a majority of our indexed terms are attached to deleted  
records. For our subject headings, for example:

evergreen=# SELECT count(msfe.id), bre.deleted FROM  
metabib.subject_field_entry msfe JOIN biblio.record_entry bre ON  
(msfe.source = bre.id) GROUP BY deleted;
   count  | deleted
---------+---------
  2576458 | t
  1517063 | f
(2 rows)

This results in superpages that are very sparsely populated with  
active nodes, especially for popular search terms that are limited by  
org unit or shelving location, et ceterea. From the end user  
perspective, this manifests itself sometimes as returning inaccurate  
result counts, or counts which vary wildly depending on the sort  
order, and sometimes returns zero results when in fact results exist  
because the result would have been on a later superpage then the  
search limits allowed.

Increasing the superpage size from the default size of 10 pages of  
1000 records each to 50 pages of 1000 records each helped somewhat,  
but we still see odd results sometimes.

So I have several questions that come from this experience:
- is there a best practice for determining how big your superpages  
should be, and how many you should have?
- is cleaning up indexing for these deleted bib records as easy as  
'DELETE FROM metabib.*_field_entry WHERE source IN (SELECT id FROM  
biblio.record_entry WHERE deleted = true)'
- what do folks think of delete indexing information for bibs at the  
time of 'deletion', and to make re-ingesting part of the undeletion  
process -- since deleted records by design can't be found by  
indexed-field searches, anyway, IIRC. If there's consensus that this  
is a good idea, I can submit a patch to that effect.

I realize this is a bit of an edge case, but I'm hoping folks still  
have some comments.

Thanks,

~B

=============================================
Brandon W. Uhlman
President, Lillooet-Camelsfoot TV and Radio Association
Lillooet, BC   V0K 1V0

brandon at branflakes.net

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the Open-ils-dev mailing list