[OPEN-ILS-DEV] Automatic stemming in Evergreen

Kathy Lussier klussier at masslnc.org
Tue Aug 14 06:22:15 EDT 2012


Hi all,

We've had difficulty finding records in our catalog due to the automatic 
stemming that occurs when records are indexed in Evergreen. As an 
example, a title on one of our summer readings lists was "The Assist" by 
Neil Swidey. However, when users were searching for "the assist" as a 
title search with the phrase enclosed in quotations, they still had to 
page through several pages of results before finding the title they 
needed. Many of the records that ranked higher contained words like 
"assistance", "assistive", "assisted", etc. because they were 
automatically stemmed at indexing, and the stemmed version of the word 
(assist) was what was stored in the index vector column. We've had many 
other examples where this stemming has made it difficult to conduct 
searches.

In digging through IRC logs and other list messages regarding stemming, 
people have mentioned that this stemming can be turned off so that the 
full words are indexed rather than the stemmed versions of a word. Can 
anybody tell me how this is done? I understand that the records would 
need to be reingested, but is there a flag that needs to be disabled to 
turn off the stemming or does it require something else? Also, is there 
a way to use another dictionary for the stemmer so that the stemming is 
somewhat less aggressive than is used by the snowball stemmer? Overall, 
we like the concept of stemming, particularly when it retrieves results 
for both singular and plural versions of a word, but we've had many 
examples where stemming seems to be throwing users off course.

Has anybody else had similar issues?

Thanks!
Kathy

-- 
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
klussier at masslnc.org
Twitter: http://www.twitter.com/kmlussier



More information about the Open-ils-dev mailing list