[OPEN-ILS-GENERAL] Improving relevance ranking in Evergreen

Brian Greene BGreene at cgcc.cc.or.us
Thu Mar 22 16:56:51 EDT 2012


Does relevancy ranking currently take publication date into account? I think this could be especially helpful with topical searches when, all other things being equal, I'd probably consider the newer item to be more relevant. Similarly, I could see home library (in cases where that can be determined) being considered and used when there are two otherwise equally relevant items. Note that in both cases I don't want them to become de facto limiters, but rather act more like tie-breakers after the other factors have been weighed. 
 
I also support taking into account some sort of popularity measure.
 
Thanks,
Brian 

 
Brian Greene, Library Director
Columbia Gorge Community College 
The Dalles, Oregon 97058
(541) 506-6080 | www.cgcc.cc.or.us 
>>> Mike Rylander <mrylander at gmail.com> 3/8/2012 10:55 AM >>>
On Thu, Mar 8, 2012 at 12:10 PM, Elizabeth Longwell <blongwel at eou.edu> wrote:
> Hi,
>
> Is it necessary to re-index after changing weights for relevancy?

Not at all. The only gotcha is that cached searches won't show the
changed weighting (of course).  So, say you searched for "rowling"
(sans quotes) and wanted to test an author-weighting change made after
the search (but before the cache expired), search again for "rowling
-asdlfkaf" (again, sans quotes).  That negated random string at the
end kills the cache without materially changing the query.

-- 
Mike Rylander
| Director of Research and Development
| Equinox Software, Inc. / Your Library's Guide to Open Source
| phone:  1-877-OPEN-ILS (673-6457)
| email:  miker at esilibrary.com
| web:  http://www.esilibrary.com


>
> Beth Longwell
> Sage Library System
>
> On Wed, Mar 7, 2012 at 5:29 PM, Mike Rylander <mrylander at gmail.com> wrote:
>> On Wed, Mar 7, 2012 at 2:57 PM, Kathy Lussier <klussier at masslnc.org> wrote:
>>> Hi Mike,
>>>
>>>>>To be clear, weighting hits that come from different index definitions
>>>>>has always been possible.  2.2 will have a staff client interface to
>>>>>make it easier, but the capability has been there all along.
>>>
>>> Is this staff client interface already available in master? If so, can you
>>> give me a little more information on how this is done?
>>
>> It is.  Go to  Admin -> Server Administration -> MARC Search/Facet
>> Fields and see the Weight field.  The higher the number, the more
>> "important" the field.
>>
>> --
>> Mike Rylander
>>  | Director of Research and Development
>>  | Equinox Software, Inc. / Your Library's Guide to Open Source
>>  | phone:  1-877-OPEN-ILS (673-6457)
>>  | email:  miker at esilibrary.com
>>  | web:  http://www.esilibrary.com
>>
>>
>>>
>>> Thanks!
>>> Kathy
>>>
>>>
>>>
>>>>>-----Original Message-----
>>>>>From: open-ils-general-bounces at list.georgialibraries.org [mailto:open-
>>>>>ils-general-bounces at list.georgialibraries.org] On Behalf Of Mike
>>>>>Rylander
>>>>>Sent: Wednesday, March 07, 2012 10:11 AM
>>>>>To: Evergreen Discussion Group
>>>>>Subject: Re: [OPEN-ILS-GENERAL] Improving relevance ranking in
>>>>>Evergreen
>>>>>
>>>>>On Wed, Mar 7, 2012 at 8:35 AM, Hardy, Elaine
>>>>><ehardy at georgialibraries.org> wrote:
>>>>>> Kathy,
>>>>>>
>>>>>> While the relevance display is much improved in 2.x, it would be good
>>>>>to
>>>>>> have greater relevance given, in a keyword search, to title
>>>>>(specifically
>>>>>> the 245)and then subject fields. I also see where having a popularity
>>>>>> ranking might be beneficial.
>>>>>>
>>>>>> I just had to explain to a board member of one of our libraries why
>>>>>his
>>>>>> search for John Sandford turned up children's titles first. So having
>>>>>MARC
>>>>>> field 100s ranked higher than 700 in author searches would be
>>>>>beneficial
>>>>>> as well.
>>>>>>
>>>>>
>>>>>To be clear, weighting hits that come from different index definitions
>>>>>has always been possible.  2.2 will have a staff client interface to
>>>>>make it easier, but the capability has been there all along.
>>>>>
>>>>>Weighting different parts of one indexed term -- say, weighting the
>>>>>title embedded in the keyword blob higher than the subjects embedded
>>>>>in the same blob -- would require the above-mentioned "make use of
>>>>>tsearch class weighting".  But one can approximate that today by
>>>>>duplicating the index definitions from, say, title, author and subject
>>>>>classes within the keyword class.
>>>>>
>>>>>--
>>>>>Mike Rylander
>>>>> | Director of Research and Development
>>>>> | Equinox Software, Inc. / Your Library's Guide to Open Source
>>>>> | phone:  1-877-OPEN-ILS (673-6457)
>>>>> | email:  miker at esilibrary.com
>>>>> | web:  http://www.esilibrary.com
>>>>>
>>>>>
>>>>>> I can't comment on any of the coding possibilities other than to say
>>>>>which
>>>>>> every way doesn't negatively impact search return time is preferable.
>>>>>>
>>>>>> Elaine
>>>>>>
>>>>>>
>>>>>> J. Elaine Hardy
>>>>>> PINES Bibliographic Projects and Metadata Manager
>>>>>> Georgia Public Library Service,
>>>>>> A Unit of the University System of Georgia
>>>>>> 1800 Century Place, Suite 150
>>>>>> Atlanta, Ga. 30345-4304
>>>>>> 404.235-7128
>>>>>> 404.235-7201, fax
>>>>>>
>>>>>> ehardy at georgialibraries.org
>>>>>> www.georgialibraries.org
>>>>>> http://www.georgialibraries.org/pines/
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: open-ils-general-bounces at list.georgialibraries.org
>>>>>> [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf
>>>>>Of
>>>>>> Kathy Lussier
>>>>>> Sent: Tuesday, March 06, 2012 4:43 PM
>>>>>> To: 'Evergreen Discussion Group'
>>>>>> Subject: [OPEN-ILS-GENERAL] Improving relevance ranking in Evergreen
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I mentioned this during an e-mail discussion on the list last month,
>>>>>but I
>>>>>> just wanted to hear from others in the Evergreen community about
>>>>>whether
>>>>>> there is a desire to improve the relevance ranking for search results
>>>>>in
>>>>>> Evergreen. Currently, we can tweak relevancy in the opensrf.xml, and
>>>>>it
>>>>>> can look at things like the document length, word proximity, and
>>>>>unique
>>>>>> word count. We've found that we had to remove the modifiers for
>>>>>document
>>>>>> length and unique word count to prevent a problem where brief bib
>>>>>records
>>>>>> were ranked way too high in our search results.
>>>>>>
>>>>>> In our local discussions, we've thought the following enhancements
>>>>>could
>>>>>> improve the ranking of search results:
>>>>>>
>>>>>> * Giving greater weight to a record if the search terms appear in the
>>>>>> title or subject (ideally, we would like these field to be
>>>>>configurable.)
>>>>>> This is something that is tweakable in search.relevance_ranking, but
>>>>>my
>>>>>> understanding is that the use of these tweaks results in a major
>>>>>reduction
>>>>>> in search performance.
>>>>>>
>>>>>> * Using some type of popularity metric to boost relevancy for popular
>>>>>> titles. I'm not sure what this metric should be (number of copies
>>>>>attached
>>>>>> to record? Total circs in last x months? Total current circs?), but
>>>>>we
>>>>>> believe some type of popularity measure would be particularly helpful
>>>>>in a
>>>>>> public library where searches will often be for titles that are
>>>>>popular.
>>>>>> For example, a search for "twilight" will most likely be for the
>>>>>Stephanie
>>>>>> Meyers novel and not this
>>>>>> http://books.google.com/books/about/Twilight.html?id=zEhkpXCyGzIC.
>>>>>Mike
>>>>>> Rylander had indicated in a previous e-mail
>>>>>> (http://markmail.org/message/h6u5r3sy4nr36wsl) that we might be able
>>>>>to
>>>>>> handle this through an overnight cron job without a negative impact
>>>>>on
>>>>>> search speeds.
>>>>>>
>>>>>> Do others think these two enhancements would improve the search
>>>>>results in
>>>>>> Evergreen? Do you think there are other things we could do to improve
>>>>>> relevancy? My main concern would be that any changes might slow down
>>>>>> search speeds, and I would want to make sure that we could do
>>>>>something to
>>>>>> retrieve better search results without a slowdown.
>>>>>>
>>>>>> Also, I was wondering if this type of project might be a good
>>>>>candidate
>>>>>> for a Google Summer of Code project.
>>>>>>
>>>>>> I look forward to hearing your feedback!
>>>>>>
>>>>>> Kathy
>>>>>>
>>>>>> -------------------------------------------------------------
>>>>>> Kathy Lussier
>>>>>> Project Coordinator
>>>>>> Massachusetts Library Network Cooperative
>>>>>> (508) 756-0172
>>>>>> (508) 755-3721 (fax)
>>>>>> klussier at masslnc.org
>>>>>> IM: kmlussier (AOL & Yahoo)
>>>>>> Twitter: http://www.twitter.com/kmlussier
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20120322/19bcd0ec/attachment-0001.htm>


More information about the Open-ils-general mailing list