[OPEN-ILS-GENERAL] Relevance ranking
Mike Rylander
mrylander at gmail.com
Thu Feb 9 11:27:33 EST 2012
On Feb 9, 2012 10:57 AM, "Kathy Lussier" <klussier at masslnc.org> wrote:
>
> Thanks Mike!
>
> >>What does removing #CD_uniqueWords do for you?
>
> Thank you! This tweak seemed to make a big difference. My search for
"dogs"
> no longer retrieves a bunch of records for puppets in my first page of
> results.
>
Great!
> For the logs, in case anyone else experiences this problem, I adjusted CD
> modifiers in the opensrf.xml file so that it now just says:
>
> <default_CD_modifiers>#CD_meanHarmonic</default_CD_modifiers>
>
>
> >>So, the problem (which may be obvious) is that shorter records (as
> >>brief records would be) provide a better match in terms of cover
> >>density, all else being equal. There's nothing in there today to mark
> >>a record as brief per se (that is, "fast add" or the like) but that is
> >>one use for the bib source field. There are probably may solutions to
> >>this problem, but I think an effective, though relatively large,
> >>hammer would be to sort any records that have a source above any
> >>records that don't. Then, adjust ranking based on the quality field
> >>from the source -- perhaps with an over-under threshold value that
> >>provides a stark dividing line between "good" and "bad" sources.
> >>
> >>This would all be development, of course, though not particularly
> >>intensive.
>
>
> It would be interesting to look at other approaches since I'm guessing the
> tweak I made to push brief records lower in the list might negatively
impact
> relevancy among other records. However, this suggested change fixes just
one
> specific issue, and I would love to consider changes that improve the
> overall relevance of search results. For example, if I remember correctly,
> we previously could tweak settings in the search.relevance_ranking so that
> words appearing in a title or subject could be given more weight than
those
> appearing elsewhere in a record. Reintroducing that functionality (without
> negatively impacting search performance) would be a big step forward in
> returning more relevant results.
>
You do remember correctly, and that is all still there, just not in use by
default. I would personally love to work on reducing the cost of those
features.
> Early in the MassLNC project, we had discussed the idea of incorporating
> popularity metrics into the relevance ranking (e.g. records get a higher
> boost if they have more copies, high circ counts, etc.). I was reminded of
> this idea recently when reading this blog post -
>
http://www.evilreads.com/blog/why-barnes-and-noble-is-doomed-in-one-screensh
> ot.html - about the shortcomings of Barnes and Noble's search algorithm.
If
> Evergreen were able to boost a record in the search results based on
> popularity, it would not only improve our specific issue with the brief
> records, but would also help out the patron who is searching "Lincoln" to
> find that book she just heard about on tv.
>
> Obviously, this would take development, but is it realistically feasible
to
> incorporate popularity metrics in the ranking without negatively impacting
> search speeds?
>
If it's treated as a nightly job that creates relative scoring adjustment
per record, say, then it is entirely doable.
> Thanks for your thoughts on this!
>
Thanks for bringing it back up!
--mike (at conference, from my phone)
> Kathy
>
>
> >>-----Original Message-----
> >>From: open-ils-general-bounces at list.georgialibraries.org [mailto:open-
> >>ils-general-bounces at list.georgialibraries.org] On Behalf Of Mike
> >>Rylander
> >>Sent: Monday, January 30, 2012 3:20 PM
> >>To: Evergreen Discussion Group
> >>Subject: Re: [OPEN-ILS-GENERAL] Relevance ranking
> >>
> >>On Wed, Jan 25, 2012 at 11:47 AM, Kathy Lussier <klussier at masslnc.org>
> >>wrote:
> >>> Hi all,
> >>>
> >>> Can anyone provide some insight as to how we might be able to make
> >>some
> >>> tweaks to the relevance ranking using the new rank_cd() method? We
> >>have
> >>> noticed an issue in one of our systems with 2.2 alpha1 where brief
> >>bib
> >>> records tend to appear first in a list of search results. After
> >>reading
> >>> through the documentation in the opensrf.xml file (BTW - I love the
> >>level of
> >>> documentation that is provided here!), I removed #CD_documentLength
> >>from the
> >>> list of default CD modifiers so that it now reads:
> >>>
> >>> <default_CD_modifiers>#CD_meanHarmonic
> >>> #CD_uniqueWords</default_CD_modifiers>
> >>>
> >>
> >>What does removing #CD_uniqueWords do for you?
> >>
> >>> However, those brief bib records still keep floating to the top of
> >>search
> >>> results. I understand why you might want to reduce the relevancy for
> >>longer
> >>> documents when you are doing full-text searches, but I'm not sure the
> >>same
> >>> reasoning should be applied when searching metadata in MARC records.
> >>>
> >>> Has anyone else encountered similar problems with brief bib records?
> >> Has
> >>> anyone made successful tweaks to the relevance ranking that they
> >>would be
> >>> willing to share with the list?
> >>
> >>So, the problem (which may be obvious) is that shorter records (as
> >>brief records would be) provide a better match in terms of cover
> >>density, all else being equal. There's nothing in there today to mark
> >>a record as brief per se (that is, "fast add" or the like) but that is
> >>one use for the bib source field. There are probably may solutions to
> >>this problem, but I think an effective, though relatively large,
> >>hammer would be to sort any records that have a source above any
> >>records that don't. Then, adjust ranking based on the quality field
> >>from the source -- perhaps with an over-under threshold value that
> >>provides a stark dividing line between "good" and "bad" sources.
> >>
> >>This would all be development, of course, though not particularly
> >>intensive.
> >>
> >>Aside: I just experimented with a subtle approach of adjusting the
> >>rank by the 1-(1/logN) of the quality of the bib source, if set ... it
> >>looks promising as a fine tuning tool, but not useful in this context.
> >>
> >>--
> >>Mike Rylander
> >> | Director of Research and Development
> >> | Equinox Software, Inc. / Your Library's Guide to Open Source
> >> | phone: 1-877-OPEN-ILS (673-6457)
> >> | email: miker at esilibrary.com
> >> | web: http://www.esilibrary.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20120209/4500cf28/attachment-0001.htm>
More information about the Open-ils-general
mailing list