[OPEN-ILS-GENERAL] Improving relevance ranking in Evergreen

Mike Rylander mrylander at gmail.com
Thu Mar 22 17:39:53 EDT 2012


On Thu, Mar 22, 2012 at 4:56 PM, Brian Greene <BGreene at cgcc.cc.or.us> wrote:
> Does relevancy ranking currently take publication date into account? I think
> this could be especially helpful with topical searches when, all other
> things being equal, I'd probably consider the newer item to be more
> relevant.

It is. The Date1 fixed field is used as a first tie-breaker after the
primary (user-chosen) sort axis.

--miker

> Similarly, I could see home library (in cases where that can
> be determined) being considered and used when there are two otherwise
> equally relevant items. Note that in both cases I don't want them to become
> de facto limiters, but rather act more like tie-breakers after the other
> factors have been weighed.
>
> I also support taking into account some sort of popularity measure.
>
> Thanks,
> Brian
>
>
> Brian Greene, Library Director
> Columbia Gorge Community College
> The Dalles, Oregon 97058
> (541) 506-6080 | www.cgcc.cc.or.us
>>>> Mike Rylander <mrylander at gmail.com> 3/8/2012 10:55 AM >>>
> On Thu, Mar 8, 2012 at 12:10 PM, Elizabeth Longwell <blongwel at eou.edu>
> wrote:
>> Hi,
>>
>> Is it necessary to re-index after changing weights for relevancy?
>
> Not at all. The only gotcha is that cached searches won't show the
> changed weighting (of course).  So, say you searched for "rowling"
> (sans quotes) and wanted to test an author-weighting change made after
> the search (but before the cache expired), search again for "rowling
> -asdlfkaf" (again, sans quotes).  That negated random string at the
> end kills the cache without materially changing the query.
>
> --
> Mike Rylander
> | Director of Research and Development
> | Equinox Software, Inc. / Your Library's Guide to Open Source
> | phone:  1-877-OPEN-ILS (673-6457)
> | email:  miker at esilibrary.com
> | web:  http://www.esilibrary.com
>
>
>>
>> Beth Longwell
>> Sage Library System
>>
>> On Wed, Mar 7, 2012 at 5:29 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>> On Wed, Mar 7, 2012 at 2:57 PM, Kathy Lussier <klussier at masslnc.org>
>>> wrote:
>>>> Hi Mike,
>>>>
>>>>>>To be clear, weighting hits that come from different index definitions
>>>>>>has always been possible.  2.2 will have a staff client interface to
>>>>>>make it easier, but the capability has been there all along.
>>>>
>>>> Is this staff client interface already available in master? If so, can
>>>> you
>>>> give me a little more information on how this is done?
>>>
>>> It is.  Go to  Admin -> Server Administration -> MARC Search/Facet
>>> Fields and see the Weight field.  The higher the number, the more
>>> "important" the field.
>>>
>>> --
>>> Mike Rylander
>>>  | Director of Research and Development
>>>  | Equinox Software, Inc. / Your Library's Guide to Open Source
>>>  | phone:  1-877-OPEN-ILS (673-6457)
>>>  | email:  miker at esilibrary.com
>>>  | web:  http://www.esilibrary.com
>>>
>>>
>>>>
>>>> Thanks!
>>>> Kathy
>>>>
>>>>
>>>>
>>>>>>-----Original Message-----
>>>>>>From: open-ils-general-bounces at list.georgialibraries.org [mailto:open-
>>>>>>ils-general-bounces at list.georgialibraries.org] On Behalf Of Mike
>>>>>>Rylander
>>>>>>Sent: Wednesday, March 07, 2012 10:11 AM
>>>>>>To: Evergreen Discussion Group
>>>>>>Subject: Re: [OPEN-ILS-GENERAL] Improving relevance ranking in
>>>>>>Evergreen
>>>>>>
>>>>>>On Wed, Mar 7, 2012 at 8:35 AM, Hardy, Elaine
>>>>>><ehardy at georgialibraries.org> wrote:
>>>>>>> Kathy,
>>>>>>>
>>>>>>> While the relevance display is much improved in 2.x, it would be good
>>>>>>to
>>>>>>> have greater relevance given, in a keyword search, to title
>>>>>>(specifically
>>>>>>> the 245)and then subject fields. I also see where having a popularity
>>>>>>> ranking might be beneficial.
>>>>>>>
>>>>>>> I just had to explain to a board member of one of our libraries why
>>>>>>his
>>>>>>> search for John Sandford turned up children's titles first. So having
>>>>>>MARC
>>>>>>> field 100s ranked higher than 700 in author searches would be
>>>>>>beneficial
>>>>>>> as well.
>>>>>>>
>>>>>>
>>>>>>To be clear, weighting hits that come from different index definitions
>>>>>>has always been possible.  2.2 will have a staff client interface to
>>>>>>make it easier, but the capability has been there all along.
>>>>>>
>>>>>>Weighting different parts of one indexed term -- say, weighting the
>>>>>>title embedded in the keyword blob higher than the subjects embedded
>>>>>>in the same blob -- would require the above-mentioned "make use of
>>>>>>tsearch class weighting".  But one can approximate that today by
>>>>>>duplicating the index definitions from, say, title, author and subject
>>>>>>classes within the keyword class.
>>>>>>
>>>>>>--
>>>>>>Mike Rylander
>>>>>> | Director of Research and Development
>>>>>> | Equinox Software, Inc. / Your Library's Guide to Open Source
>>>>>> | phone:  1-877-OPEN-ILS (673-6457)
>>>>>> | email:  miker at esilibrary.com
>>>>>> | web:  http://www.esilibrary.com
>>>>>>
>>>>>>
>>>>>>> I can't comment on any of the coding possibilities other than to say
>>>>>>which
>>>>>>> every way doesn't negatively impact search return time is preferable.
>>>>>>>
>>>>>>> Elaine
>>>>>>>
>>>>>>>
>>>>>>> J. Elaine Hardy
>>>>>>> PINES Bibliographic Projects and Metadata Manager
>>>>>>> Georgia Public Library Service,
>>>>>>> A Unit of the University System of Georgia
>>>>>>> 1800 Century Place, Suite 150
>>>>>>> Atlanta, Ga. 30345-4304
>>>>>>> 404.235-7128
>>>>>>> 404.235-7201, fax
>>>>>>>
>>>>>>> ehardy at georgialibraries.org
>>>>>>> www.georgialibraries.org
>>>>>>> http://www.georgialibraries.org/pines/
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: open-ils-general-bounces at list.georgialibraries.org
>>>>>>> [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf
>>>>>>Of
>>>>>>> Kathy Lussier
>>>>>>> Sent: Tuesday, March 06, 2012 4:43 PM
>>>>>>> To: 'Evergreen Discussion Group'
>>>>>>> Subject: [OPEN-ILS-GENERAL] Improving relevance ranking in Evergreen
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I mentioned this during an e-mail discussion on the list last month,
>>>>>>but I
>>>>>>> just wanted to hear from others in the Evergreen community about
>>>>>>whether
>>>>>>> there is a desire to improve the relevance ranking for search results
>>>>>>in
>>>>>>> Evergreen. Currently, we can tweak relevancy in the opensrf.xml, and
>>>>>>it
>>>>>>> can look at things like the document length, word proximity, and
>>>>>>unique
>>>>>>> word count. We've found that we had to remove the modifiers for
>>>>>>document
>>>>>>> length and unique word count to prevent a problem where brief bib
>>>>>>records
>>>>>>> were ranked way too high in our search results.
>>>>>>>
>>>>>>> In our local discussions, we've thought the following enhancements
>>>>>>could
>>>>>>> improve the ranking of search results:
>>>>>>>
>>>>>>> * Giving greater weight to a record if the search terms appear in the
>>>>>>> title or subject (ideally, we would like these field to be
>>>>>>configurable.)
>>>>>>> This is something that is tweakable in search.relevance_ranking, but
>>>>>>my
>>>>>>> understanding is that the use of these tweaks results in a major
>>>>>>reduction
>>>>>>> in search performance.
>>>>>>>
>>>>>>> * Using some type of popularity metric to boost relevancy for popular
>>>>>>> titles. I'm not sure what this metric should be (number of copies
>>>>>>attached
>>>>>>> to record? Total circs in last x months? Total current circs?), but
>>>>>>we
>>>>>>> believe some type of popularity measure would be particularly helpful
>>>>>>in a
>>>>>>> public library where searches will often be for titles that are
>>>>>>popular.
>>>>>>> For example, a search for "twilight" will most likely be for the
>>>>>>Stephanie
>>>>>>> Meyers novel and not this
>>>>>>> http://books.google.com/books/about/Twilight.html?id=zEhkpXCyGzIC.
>>>>>>Mike
>>>>>>> Rylander had indicated in a previous e-mail
>>>>>>> (http://markmail.org/message/h6u5r3sy4nr36wsl) that we might be able
>>>>>>to
>>>>>>> handle this through an overnight cron job without a negative impact
>>>>>>on
>>>>>>> search speeds.
>>>>>>>
>>>>>>> Do others think these two enhancements would improve the search
>>>>>>results in
>>>>>>> Evergreen? Do you think there are other things we could do to improve
>>>>>>> relevancy? My main concern would be that any changes might slow down
>>>>>>> search speeds, and I would want to make sure that we could do
>>>>>>something to
>>>>>>> retrieve better search results without a slowdown.
>>>>>>>
>>>>>>> Also, I was wondering if this type of project might be a good
>>>>>>candidate
>>>>>>> for a Google Summer of Code project.
>>>>>>>
>>>>>>> I look forward to hearing your feedback!
>>>>>>>
>>>>>>> Kathy
>>>>>>>
>>>>>>> -------------------------------------------------------------
>>>>>>> Kathy Lussier
>>>>>>> Project Coordinator
>>>>>>> Massachusetts Library Network Cooperative
>>>>>>> (508) 756-0172
>>>>>>> (508) 755-3721 (fax)
>>>>>>> klussier at masslnc.org
>>>>>>> IM: kmlussier (AOL & Yahoo)
>>>>>>> Twitter: http://www.twitter.com/kmlussier
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>



-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-general mailing list