[OPEN-ILS-GENERAL] Bib records not indexed - Evergreen 1.6.0.3

Dan Scott dan at coffeecode.net
Mon Sep 13 10:38:29 EDT 2010


Hi Elaine:

On Mon, 2010-09-13 at 08:18 -0400, Hardy, Elaine wrote:
> We have reported a related problem -- where names such as Peter Høeg
> cannot be retrieved using Hoeg. 

Hmm, where did you report this problem? I can't find anything on
http://bugs.launchpad.net/evergreen - which is the community bug tracker
for the Evergreen project. Maybe we discussed it on the mailing list in
ages past...

> My understanding was that this was not an indexing problem, however.
> It had to do with recognition of transliterations of some diacritics
> and other non-English letters. Is this fixed in 2.0 also?

Hmm. That sounds like a distinction without much of a difference to me.

If you want to make 'Høeg' retrievable by searches for both 'Høeg' and
'Hoeg', then in 1.6 you need to touch two places:

  * OpenILS/Application/Storage/Driver/Pg/fts.pm
  * OpenILS/Application/Ingest.pm

See http://svn.open-ils.org/trac/ILS-Contrib/changeset/987 for an
example of an indexing normalization that I just added to Conifer for
the Polish l (ł).

Oh, and then you'll have to reingest all of the records that contain the
character(s) you've added to the indexing normalization. A SELECT
statement that retrieves the IDs of the affected bib records (WHERE marc
LIKE '%ł%' OR marc LIKE '%Ł%'), then feeds those IDs to the
open-ils.ingest.full.biblio.record method, would do the trick.

In 2.0, similar changes are necessary to fts.pm but the ingest process
is all in-database, so you have to adjust a normalization stored
procedure (public.naco_normalize) instead or add a new normalization
routine containing your desired character mappings. 

Now, the question is whether every site wants the same normalizations by
default.



More information about the Open-ils-general mailing list