[OPEN-ILS-DEV] Relevance (again)

Mike Rylander mrylander at gmail.com
Wed Feb 13 19:00:21 EST 2008


On Feb 13, 2008 4:30 PM, Patrick Durusau <patrick at durusau.net> wrote:
> Hello,
>
> When I first joined this list I had a question about the search
> algorithm that was never quite answered. A problem with it has come up
> again.
>
> I search for "Apple fruit" and got 18 "hits."
>
> To the immediate left I have a listing of relevant subjects, the first
> one of which is "Apples." Followed by "Fruit trees", "Fruit", then
> "Frontier and pioneer life" and then "Overland journeys to the Pacific".
>
> Oh, but it gets better.
>
> Guess what is returned if you select "Apples?" Well partner, it isn't
> Dewey 583.73 Apples.

The subject sidebar entry links create new searches, so you're in
effect broadening your search from "subject:apple fruit" to
"subject:apples".

>
> No, it helpfully returns 568 "hits" which starts off with Apple
> Computers, includes Appling Country census results and the tenth item is
> an apple cookbook.

Among other things, the search infrastructure will stem any unadorned
terms that you enter, which turns "apples" into "apple".  "Appling"
becomes

>
> Does that strike anyone besides myself as rather odd behavior for a
> search engine? Or perhaps I should say, a library search engine?
>
> Well, but opinions are going to vary on that score aren't they?
>
> My real question is: Where is the relevance behavior for Evergreen set
> such that I can alter it?
>

That depends on the version.  You were testing on the production PINES
servers (it seems, as I replicated your searches and result counts
there), which is currently on 1.2.1.2 (soon to be 1.2.1.3).  There are
weighting values that you can apply in 1.2 that control how much a
particular searched field is worth.  So, for instance, topical
subjects could be weighted higher than corporate name subjects, which
would make the Granny Smiths float to the top, above the ][e
handbooks.

> That gets us past all the normative questions and to one that is purely
> technical. I want to *alter* the relevance behavior of Evergreen
> searches. Where is that done?
>

There are many different things that can be done to change the way
Evergreen performs searches.  One could replace, or augment, the
snoball stemmer that is used by default with a dictionary stemmer (or
a non-stemming dictionary).  One could turn off stemming altogether,
and require exact word matches.

In future versions (as the plan stands, 1.4 to some degree and 2.0 a
larger degree) one will be able to adjust the relevancy bonuses given
under certain circumstances.  For instance, title searches give a
higher rank when the searched words are in the same order in both the
field and the query.  Author searches give a large bonus when the
first word of both the field and the query match exactly.  Bonuses are
given all around when phrases match.  And, obviously, a normalized
full-query-and-field match gets a very large bonus.

One way to effectively turn off stemming today is to quote words and
phrases, which forces a space and case-normalized direct match for the
quotes sections of text in the query.

Does that answer some of your questions?

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list