[OPEN-ILS-GENERAL] Problems with search operators on advanced search screen

Kathy Lussier klussier at masslnc.org
Mon Jul 30 10:39:10 EDT 2012


Hi all,

Since going live with tpac in late May, two of the MassLNC consortia 
have been discovering problems with some of the operators on the 
advanced search page. In both cases, they are working differently than 
they did in jspac, and we believe they lead to unexpected search 
results. I wanted to share our experience with the rest of the community 
to see if there is any interest in changing the way these operators work.

The first problem is with the "does not contain" option. If I do an 
advanced search for contains "martin luther" and does not contain 
"king," the search is conducted as (martin luther && keyword:-"king"). 
In jspac, a similar search would not have surrounded king in quotation 
marks. For Evergreen systems using the default indexes, this change 
doesn't seem to cause much of a problem. However, there are many systems 
that have added indexes to the keyword search so that an index like 
proper title can be weighted more heavily in the relevance ranking than 
other indexes. In those systems, the addition of the quotation marks to 
this search query does a terrible job of excluding a search term from 
the query. As an example, see the following "does not contain" search 
where we try to exclude the term "king" from the search:

http://bark.cwmars.org/eg/opac/results?bool=and&qtype=keyword&contains=contains&query=martin+luther&bool=and&qtype=keyword&contains=nocontains&query=king&bool=and&qtype=keyword&contains=contains&query=&sort=&locg=1&pubdate=is&date1=&date2=&_adv=1

The same search with the quotation marks removed shows a big improvement:

http://bark.cwmars.org/eg/opac/results?fi%3Aitem_type=&query=%28martin+luther+%26%26+keyword%3A-king%29&qtype=keyword&locg=1&_adv=1&page=0&sort=

I filed a bug on this issue several weeks ago at 
https://bugs.launchpad.net/evergreen/+bug/1019360, and Mike Rylander 
suggested that I send out a message to the general list to see if there 
is any objection to removing the quotation marks from the query when the 
"does not contain" option is used.

We have also encountered problems with the "Matches Exactly" option. In 
jspac, this option surrounded the search terms in quotation marks, 
essentially making it a phrase search. In tpac, there is now a "contains 
phrase" search that does the same thing. The "matches exactly" option 
now uses left- and right-anchored searching so that a "matches exactly" 
search for "great expectations" will conduct the search as ^great 
expectations$. In our testing, we have found that this search string 
yields the same number of search results as a simple "contains" search. 
The "Matches Exactly" search isn't really doing anything special for 
this search.

After asking some questions in IRC, Dan Scott suggested that surrounding 
the search query in quotation marks may be more successful - "^great 
expectations$" does indeed lead to expected results. However, this 
option for exact matches is very strict. To find the record for The 
Assistant by Robert Walser - 
http://bark.cwmars.org/eg/opac/record/2451793 - we needed to include the 
forward slash in the title search, so that the final search statement 
was "^the assistant /$" .

In this case, I was inclined to recommend that the quotation marks be 
added to the search string, but there is another inherent problem with 
the "Matches Exactly" search. For a system using the default indexes, I 
don't see how a "Matches Exactly" search could ever successfully yield 
results from a keyword search. If I'm understanding it correctly (and my 
testing has verified this understanding), this search string must 
exactly match the entire string in an index. In the default setup, the 
keyword index includes every indexed term from the record, and a user 
would never enter all of those search terms in the correct order. 
Ironically, this search query does have some success in our own catalogs 
for precisely the same reason that the "does not contain" search failed. 
We have other indexes included as part of our keyword search, and there 
is a possibility that one of those indexes will contain the exact terms 
being searched.

Given this information, we are considering removal of  the "Matches 
Exactly" search locally since it isn't working in its current iteration 
and will continue to result in unexpected behavior if it were changed to 
include the quotation marks. However, I also wanted to send this 
information along to the community since it will most likely lead to 
unexpected results elsewhere. I'll also be filing a Launchpad bug with 
this information shortly.

In our discussions, we were thinking a "Starts With" search that left 
anchors the search term (e.g. "^the assistant") might be more useful 
than the "Matches Exactly" search since a user would not need to 
remember subtitles or include forward slashes. In some initial testing, 
it also seems to work better in a keyword search. I may be trying a 
local implementation of this and will share my results if I have any luck.

Kathy

-- 
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
klussier at masslnc.org
Twitter: http://www.twitter.com/kmlussier



More information about the Open-ils-general mailing list