[OPEN-ILS-GENERAL] Relevance ranking
Mike Rylander
mrylander at gmail.com
Mon Jan 30 15:20:28 EST 2012
On Wed, Jan 25, 2012 at 11:47 AM, Kathy Lussier <klussier at masslnc.org> wrote:
> Hi all,
>
> Can anyone provide some insight as to how we might be able to make some
> tweaks to the relevance ranking using the new rank_cd() method? We have
> noticed an issue in one of our systems with 2.2 alpha1 where brief bib
> records tend to appear first in a list of search results. After reading
> through the documentation in the opensrf.xml file (BTW - I love the level of
> documentation that is provided here!), I removed #CD_documentLength from the
> list of default CD modifiers so that it now reads:
>
> <default_CD_modifiers>#CD_meanHarmonic
> #CD_uniqueWords</default_CD_modifiers>
>
What does removing #CD_uniqueWords do for you?
> However, those brief bib records still keep floating to the top of search
> results. I understand why you might want to reduce the relevancy for longer
> documents when you are doing full-text searches, but I'm not sure the same
> reasoning should be applied when searching metadata in MARC records.
>
> Has anyone else encountered similar problems with brief bib records? Has
> anyone made successful tweaks to the relevance ranking that they would be
> willing to share with the list?
So, the problem (which may be obvious) is that shorter records (as
brief records would be) provide a better match in terms of cover
density, all else being equal. There's nothing in there today to mark
a record as brief per se (that is, "fast add" or the like) but that is
one use for the bib source field. There are probably may solutions to
this problem, but I think an effective, though relatively large,
hammer would be to sort any records that have a source above any
records that don't. Then, adjust ranking based on the quality field
from the source -- perhaps with an over-under threshold value that
provides a stark dividing line between "good" and "bad" sources.
This would all be development, of course, though not particularly intensive.
Aside: I just experimented with a subtle approach of adjusting the
rank by the 1-(1/logN) of the quality of the bib source, if set ... it
looks promising as a fine tuning tool, but not useful in this context.
--
Mike Rylander
| Director of Research and Development
| Equinox Software, Inc. / Your Library's Guide to Open Source
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker at esilibrary.com
| web: http://www.esilibrary.com
More information about the Open-ils-general
mailing list