[OPEN-ILS-DEV] Automatic stemming in Evergreen
Kathy Lussier
klussier at masslnc.org
Tue Aug 14 06:22:15 EDT 2012
Hi all,
We've had difficulty finding records in our catalog due to the automatic
stemming that occurs when records are indexed in Evergreen. As an
example, a title on one of our summer readings lists was "The Assist" by
Neil Swidey. However, when users were searching for "the assist" as a
title search with the phrase enclosed in quotations, they still had to
page through several pages of results before finding the title they
needed. Many of the records that ranked higher contained words like
"assistance", "assistive", "assisted", etc. because they were
automatically stemmed at indexing, and the stemmed version of the word
(assist) was what was stored in the index vector column. We've had many
other examples where this stemming has made it difficult to conduct
searches.
In digging through IRC logs and other list messages regarding stemming,
people have mentioned that this stemming can be turned off so that the
full words are indexed rather than the stemmed versions of a word. Can
anybody tell me how this is done? I understand that the records would
need to be reingested, but is there a flag that needs to be disabled to
turn off the stemming or does it require something else? Also, is there
a way to use another dictionary for the stemmer so that the stemming is
somewhat less aggressive than is used by the snowball stemmer? Overall,
we like the concept of stemming, particularly when it retrieves results
for both singular and plural versions of a word, but we've had many
examples where stemming seems to be throwing users off course.
Has anybody else had similar issues?
Thanks!
Kathy
--
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
klussier at masslnc.org
Twitter: http://www.twitter.com/kmlussier
More information about the Open-ils-dev
mailing list