[OPEN-ILS-GENERAL] Problems with search operators on advanced search screen
Kathy Lussier
klussier at masslnc.org
Mon Jul 30 10:39:10 EDT 2012
Hi all,
Since going live with tpac in late May, two of the MassLNC consortia
have been discovering problems with some of the operators on the
advanced search page. In both cases, they are working differently than
they did in jspac, and we believe they lead to unexpected search
results. I wanted to share our experience with the rest of the community
to see if there is any interest in changing the way these operators work.
The first problem is with the "does not contain" option. If I do an
advanced search for contains "martin luther" and does not contain
"king," the search is conducted as (martin luther && keyword:-"king").
In jspac, a similar search would not have surrounded king in quotation
marks. For Evergreen systems using the default indexes, this change
doesn't seem to cause much of a problem. However, there are many systems
that have added indexes to the keyword search so that an index like
proper title can be weighted more heavily in the relevance ranking than
other indexes. In those systems, the addition of the quotation marks to
this search query does a terrible job of excluding a search term from
the query. As an example, see the following "does not contain" search
where we try to exclude the term "king" from the search:
http://bark.cwmars.org/eg/opac/results?bool=and&qtype=keyword&contains=contains&query=martin+luther&bool=and&qtype=keyword&contains=nocontains&query=king&bool=and&qtype=keyword&contains=contains&query=&sort=&locg=1&pubdate=is&date1=&date2=&_adv=1
The same search with the quotation marks removed shows a big improvement:
http://bark.cwmars.org/eg/opac/results?fi%3Aitem_type=&query=%28martin+luther+%26%26+keyword%3A-king%29&qtype=keyword&locg=1&_adv=1&page=0&sort=
I filed a bug on this issue several weeks ago at
https://bugs.launchpad.net/evergreen/+bug/1019360, and Mike Rylander
suggested that I send out a message to the general list to see if there
is any objection to removing the quotation marks from the query when the
"does not contain" option is used.
We have also encountered problems with the "Matches Exactly" option. In
jspac, this option surrounded the search terms in quotation marks,
essentially making it a phrase search. In tpac, there is now a "contains
phrase" search that does the same thing. The "matches exactly" option
now uses left- and right-anchored searching so that a "matches exactly"
search for "great expectations" will conduct the search as ^great
expectations$. In our testing, we have found that this search string
yields the same number of search results as a simple "contains" search.
The "Matches Exactly" search isn't really doing anything special for
this search.
After asking some questions in IRC, Dan Scott suggested that surrounding
the search query in quotation marks may be more successful - "^great
expectations$" does indeed lead to expected results. However, this
option for exact matches is very strict. To find the record for The
Assistant by Robert Walser -
http://bark.cwmars.org/eg/opac/record/2451793 - we needed to include the
forward slash in the title search, so that the final search statement
was "^the assistant /$" .
In this case, I was inclined to recommend that the quotation marks be
added to the search string, but there is another inherent problem with
the "Matches Exactly" search. For a system using the default indexes, I
don't see how a "Matches Exactly" search could ever successfully yield
results from a keyword search. If I'm understanding it correctly (and my
testing has verified this understanding), this search string must
exactly match the entire string in an index. In the default setup, the
keyword index includes every indexed term from the record, and a user
would never enter all of those search terms in the correct order.
Ironically, this search query does have some success in our own catalogs
for precisely the same reason that the "does not contain" search failed.
We have other indexes included as part of our keyword search, and there
is a possibility that one of those indexes will contain the exact terms
being searched.
Given this information, we are considering removal of the "Matches
Exactly" search locally since it isn't working in its current iteration
and will continue to result in unexpected behavior if it were changed to
include the quotation marks. However, I also wanted to send this
information along to the community since it will most likely lead to
unexpected results elsewhere. I'll also be filing a Launchpad bug with
this information shortly.
In our discussions, we were thinking a "Starts With" search that left
anchors the search term (e.g. "^the assistant") might be more useful
than the "Matches Exactly" search since a user would not need to
remember subtitles or include forward slashes. In some initial testing,
it also seems to work better in a keyword search. I may be trying a
local implementation of this and will share my results if I have any luck.
Kathy
--
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
klussier at masslnc.org
Twitter: http://www.twitter.com/kmlussier
More information about the Open-ils-general
mailing list