[OPEN-ILS-DEV] Relevance (again)
Mike Rylander
mrylander at gmail.com
Wed Feb 13 19:00:21 EST 2008
On Feb 13, 2008 4:30 PM, Patrick Durusau <patrick at durusau.net> wrote:
> Hello,
>
> When I first joined this list I had a question about the search
> algorithm that was never quite answered. A problem with it has come up
> again.
>
> I search for "Apple fruit" and got 18 "hits."
>
> To the immediate left I have a listing of relevant subjects, the first
> one of which is "Apples." Followed by "Fruit trees", "Fruit", then
> "Frontier and pioneer life" and then "Overland journeys to the Pacific".
>
> Oh, but it gets better.
>
> Guess what is returned if you select "Apples?" Well partner, it isn't
> Dewey 583.73 Apples.
The subject sidebar entry links create new searches, so you're in
effect broadening your search from "subject:apple fruit" to
"subject:apples".
>
> No, it helpfully returns 568 "hits" which starts off with Apple
> Computers, includes Appling Country census results and the tenth item is
> an apple cookbook.
Among other things, the search infrastructure will stem any unadorned
terms that you enter, which turns "apples" into "apple". "Appling"
becomes
>
> Does that strike anyone besides myself as rather odd behavior for a
> search engine? Or perhaps I should say, a library search engine?
>
> Well, but opinions are going to vary on that score aren't they?
>
> My real question is: Where is the relevance behavior for Evergreen set
> such that I can alter it?
>
That depends on the version. You were testing on the production PINES
servers (it seems, as I replicated your searches and result counts
there), which is currently on 1.2.1.2 (soon to be 1.2.1.3). There are
weighting values that you can apply in 1.2 that control how much a
particular searched field is worth. So, for instance, topical
subjects could be weighted higher than corporate name subjects, which
would make the Granny Smiths float to the top, above the ][e
handbooks.
> That gets us past all the normative questions and to one that is purely
> technical. I want to *alter* the relevance behavior of Evergreen
> searches. Where is that done?
>
There are many different things that can be done to change the way
Evergreen performs searches. One could replace, or augment, the
snoball stemmer that is used by default with a dictionary stemmer (or
a non-stemming dictionary). One could turn off stemming altogether,
and require exact word matches.
In future versions (as the plan stands, 1.4 to some degree and 2.0 a
larger degree) one will be able to adjust the relevancy bonuses given
under certain circumstances. For instance, title searches give a
higher rank when the searched words are in the same order in both the
field and the query. Author searches give a large bonus when the
first word of both the field and the query match exactly. Bonuses are
given all around when phrases match. And, obviously, a normalized
full-query-and-field match gets a very large bonus.
One way to effectively turn off stemming today is to quote words and
phrases, which forces a space and case-normalized direct match for the
quotes sections of text in the query.
Does that answer some of your questions?
--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker at esilibrary.com
| web: http://www.esilibrary.com
More information about the Open-ils-dev
mailing list