[OPEN-ILS-GENERAL] question about relevance - mult eds of same book not in most recent order

Mike Rylander mrylander at gmail.com
Fri Sep 4 15:55:30 EDT 2009


On Fri, Sep 4, 2009 at 3:18 PM, Joe Atzberger<atz at esilibrary.com> wrote:
> Karen Schneider wrote:
>>
>> Indexed-field weighting, which controls relevance ranking in Evergreen, is
>> configured in the database (no UI available yet) on the table called
>> config.metabib_field, using the ‘weight’ column.
>>
>> (The other four columns are field_class, name, xpath, and format; the
>> table is too wide to display in this email, but here is one line: author
>>  | conference  |
>> //mods32:mods/mods32:name[@type='conference']/mods32:namePart[../mods32:role/mods32:roleTerm[text()='creator']]
>> | mods32 |      1 )
>>
>> The default value for index-field weights is “1.” Adjust the weighting of
>> indexed fields to give those fields a boost in searching. The larger the
>> value for ‘weight,' the higher the relevance score for matches on that
>> indexed field.
>>
>> For example, by increasing the weight of the title-proper field,  a search
>> for *jaguar* would give higher relevance to the book titled /Aimee and
>> Jaguar /than to a record with the term *jaguar *in another indexed field.
>>
>> You can also add generic matchpoint bonuses for the following types:
>>
>> *first_word* — boosts relevance if the query is one term long and matches
>> the first term in the indexed field (search for *twain*, get a bonus for
>> *twain, mark* but not* mark twain*)
>>
>> *word_order* — increases relevance for words matching the order of search
>> terms, so that the results for the search *legend suicide* would match
>> higher for the book *Legend of a Suicide* than for the book, *Suicide
>> Legend*
>>
>> *full_match* — full_match — boosts relevance when the full query exactly
>> matches the entire indexed field (after space, case and diacritic
>> normalization on both). So a title search for *The Future of Ice* would get
>> a relevance boost above *Ice Ages of the Future*.  **
>>
>> The matchpoint bonuses are configured on a table called
>> search.relevance_adjustment, using the ‘multiplier’ column.  That is a
>> floating-point multiplier, where the relevance score is multiplied by that
>> at the end.  So, if the first-word bonus is 1.2, then the relevance score
>> gets a 20% bonus (x * 1.2).
>>
>> The search.relevance_adjustment weighting can be adjusted for each field.
>>
>> The search.relevance_adjustment table has three other columns:
>> field_class, name, and bump_type. Here are several lines from the
>> search.relevance_adjustment table:
>>
>> title       | translated  | word_order |         10
>> title       | uniform     | first_word |        1.5
>> title       | uniform     | full_match |         20
>> title       | uniform     | word_order |         10
>>
>> Does that help? If so, I'll put this on the DocBook docket.
>>
>> Big ol' thanks to Mike Rylander for helping me with this answer!
>>
>> --
>> --
>> | Karen G. Schneider
>> | Community Librarian
>> | Equinox Software Inc. "The Evergreen Experts"
>> | Toll-free: 1.877.Open.ILS (1.877.673.6457) x712
>> | kgs at esilibrary.com <mailto:kgs at esilibrary.com>
>> | Web: http://www.esilibrary.com
>>
> So this doesn't directly address the core complaint, which is that the
> different editions of the *same* title do not show up in a predictable
> order.  Is the suggestion that there may be a weighting that would impose a
> sensible order (i.e., newest first)?

There is no default tie-breaker today, and I don't think Karen was
suggesting that one would necessarily use the weighting and bonus
system to resolve what the original poster was asking about.  My
reading was that she was taking the opportunity (and the opening made
by Jason) to shed some light on what /does/ happen in the code right
now.

We have heard, quietly, suggestions for many different tie-breaker
values (count of items, count of available items, LIFO, FIFO, pub
date, distance, etc), but all suggestions so far have been personal
preference (described as such by the suggesters, AFAIR), and not
requests, per se, in any case.  I'd be personally happy to hear a
strong case made -- a discussion about the pros and cons of different
tie-breakers in a relevance-ordered list.

FWIW, pub date (or, less usefully, bib-added date) is by far the
simplest to implement and would not take a very strong case (IMO), and
at the other end of the spectrum, it would have to be a very strong
case for any item-level data (counts, types, locations) to be
considered as a tie-breaker -- this is both more difficult to define
(think: consortium) and more expensive at run-time.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-general mailing list