[OPEN-ILS-GENERAL] Searching puzzle

Dan Scott dan at coffeecode.net
Sun Oct 18 13:49:38 EDT 2009


2009/10/16  <scrlheadlib at mts.net>:
> We are using version 1.4.0.4.  If I do a key word search for Ted Arnold I do
> not pull up any records.  The correct spelling is Tedd Arnold which pulls up
> 15 records.  My understanding is that Evergreen uses “stemming” so doesn’t
> that mean it shouldn’t matter if I use Ted or Tedd??  Thanks in advance.
>
> Mary Toma

Hi Mary:

Evergreen's full-text search uses the Porter stemming algorithm,
defined at http://snowball.tartarus.org/algorithms/porter/stemmer.html

The long story short, words that end in double consonants do not
automatically get stemmed to a single consonant - if there is a double
consonant followed by a suffix that is recognized as causing doubling
of a final consonant (for example, '-ed' or '-ing'), then the word
would get stemmed down to a single consonant at the end.

You can see what's happening under the covers by connecting to your
Evergreen database and directly entering some terms into the index
tables:

evergreen=# insert INTO metabib.keyword_field_entry (source, value,
field) values (1, 'tedd', 15);
INSERT 0 1
evergreen=# select * from metabib.keyword_field_entry where value = 'tedd';
 id | source | field | value | index_vector
----+--------+-------+-------+--------------
  4 |      1 |    15 | tedd  | 'tedd':1
(1 row)

Here, the 'index_vector' field shows you what full-text search will
match against ('tedd'). Compare that to the contrived example of
inserting 'tedding':

evergreen=# insert INTO metabib.keyword_field_entry (source, value,
field) values (1, 'tedding', 15);
INSERT 0 1
evergreen=# select * from metabib.keyword_field_entry where value = 'tedding';
 id | source | field |  value  | index_vector
----+--------+-------+---------+--------------
  5 |      1 |    15 | tedding | 'ted':1
(1 row)

With 'tedding', the short-vowel double-consonant followed by a
recognized consonant-doubling suffix results in the stem of 'ted'
being indexed.

Hopefully this helps...


More information about the Open-ils-general mailing list