[OPEN-ILS-GENERAL] Searching puzzle
Dan Scott
dan at coffeecode.net
Sun Oct 18 13:49:38 EDT 2009
2009/10/16 <scrlheadlib at mts.net>:
> We are using version 1.4.0.4. If I do a key word search for Ted Arnold I do
> not pull up any records. The correct spelling is Tedd Arnold which pulls up
> 15 records. My understanding is that Evergreen uses “stemming” so doesn’t
> that mean it shouldn’t matter if I use Ted or Tedd?? Thanks in advance.
>
> Mary Toma
Hi Mary:
Evergreen's full-text search uses the Porter stemming algorithm,
defined at http://snowball.tartarus.org/algorithms/porter/stemmer.html
The long story short, words that end in double consonants do not
automatically get stemmed to a single consonant - if there is a double
consonant followed by a suffix that is recognized as causing doubling
of a final consonant (for example, '-ed' or '-ing'), then the word
would get stemmed down to a single consonant at the end.
You can see what's happening under the covers by connecting to your
Evergreen database and directly entering some terms into the index
tables:
evergreen=# insert INTO metabib.keyword_field_entry (source, value,
field) values (1, 'tedd', 15);
INSERT 0 1
evergreen=# select * from metabib.keyword_field_entry where value = 'tedd';
id | source | field | value | index_vector
----+--------+-------+-------+--------------
4 | 1 | 15 | tedd | 'tedd':1
(1 row)
Here, the 'index_vector' field shows you what full-text search will
match against ('tedd'). Compare that to the contrived example of
inserting 'tedding':
evergreen=# insert INTO metabib.keyword_field_entry (source, value,
field) values (1, 'tedding', 15);
INSERT 0 1
evergreen=# select * from metabib.keyword_field_entry where value = 'tedding';
id | source | field | value | index_vector
----+--------+-------+---------+--------------
5 | 1 | 15 | tedding | 'ted':1
(1 row)
With 'tedding', the short-vowel double-consonant followed by a
recognized consonant-doubling suffix results in the stem of 'ted'
being indexed.
Hopefully this helps...
More information about the Open-ils-general
mailing list