[OPEN-ILS-DEV] ***SPAM*** Re: ***SPAM*** Re: Napkin drawings for removing OpenILS::Application::Storage::FTS
Mike Rylander
mrylander at gmail.com
Thu Feb 11 14:32:51 EST 2010
What's the definition of insanity, again? ;)
Here's round 2 of a possible new query parser. In this version we
support configurable search classes, fields, modifiers and filters.
I'll try to construct a BNF for it, but by way of example the gist is:
Classed search:
title: harry potter
keyword: water
Specific field (index definition) search:
author|personal: rowling
subject|name: potter
Search modifiers
#metabib
#available
#descending
Argumented filters:
statuses(0,7,12)
estimation_strategy(exclusion)
item_form(d)
during(2009,2010)
Phrases (non-stemming matches):
"multi-word phrase (exact) matches"
+one +word +exact +matches
Grouping and Boolean operators (chocolate and peanut butter):
(foo && bar) || baz
Everything but the grouping, boolean and phrase operators is
configurable, so adding classes and fields, modifiers and filters is
up to (available to?) the developer. Some examples:
__PACKAGE__->add_search_class( 'keyword' );
__PACKAGE__->add_search_class( 'title' );
__PACKAGE__->add_search_class( 'author' );
__PACKAGE__->add_search_class( 'subject' );
__PACKAGE__->add_search_class( 'series' );
__PACKAGE__->add_search_field( author => 'corporate' );
__PACKAGE__->add_search_filter( 'audience' );
__PACKAGE__->add_search_filter( 'vr_format' );
__PACKAGE__->add_search_filter( 'format' );
__PACKAGE__->add_search_filter( 'item_type' );
__PACKAGE__->add_search_filter( 'item_form' );
__PACKAGE__->add_search_filter( 'lit_form' );
__PACKAGE__->add_search_filter( 'location' );
__PACKAGE__->add_search_modifier( 'available' );
__PACKAGE__->add_search_modifier( 'descending' );
__PACKAGE__->add_search_modifier( 'ascending' );
__PACKAGE__->add_search_modifier( 'metarecord' );
This also supports both class-wide and class+field aliasing via
regexp, for use in mapping CQL relations to evergreen search classes
and fields:
__PACKAGE__->add_search_class_alias( author => 'name' );
__PACKAGE__->add_search_class_alias( author => 'dc.contributor' );
__PACKAGE__->add_search_class_alias( subject =>
'bib.subject(?:Title|Place|Occupation)' );
__PACKAGE__->add_search_field_alias( subject => name => 'bib.subjectName' );
__PACKAGE__->add_search_field_alias( keyword => standard_number =>
'dc.identifier' ); # keyword|standard_number isn't a stock evergreen
index def, jfyi
Anyway, as before, feedback appreciated.
--miker
On Wed, Feb 3, 2010 at 4:55 PM, Mike Rylander <mrylander at gmail.com> wrote:
> On Thu, Jan 21, 2010 at 3:54 PM, Mike Rylander <mrylander at gmail.com> wrote:
>> Search syntax thought experiment
>> --------------------------------------------------
>>
>> Multi-stage search-to-query compilation
>>
>> Given a default search class of "keyword" and a search string of:
>> ( foo "bar" -baz || gar || ti|proper:qux) && au:junk
>>
>>
>> Stage 1: Boolean decomposition
>>
>> { bool : 'and',
>> query : [
>> { author : 'junk' },
>> { bool : 'or',
>> query : [
>> { keyword : 'foo "bar" -baz || gar' },
>> { 'title|proper' : 'qux' }
>> ]
>> }
>> ]
>> }
>>
>> Lacking grouping parens, we bind pairs of || or && separated atoms
>> (assuming &&), working from left to right. If adjacent atoms belong
>> to the same class[|field] specifier, they are folded into a single
>> leaf. Atoms are whitespace separated components not spelled '(', ')',
>> '||' or '&&'.
>>
>>
>>
>> Stage 2: Search decomposition
>>
>> { bool : 'and',
>> query : [
>> { ftsquery : ['junk'],
>> phrases : [],
>> classname : 'author',
>> fields : [7, 8, 9, 10]
>> },
>> { bool : 'or',
>> query : [
>> { ftsquery : [[['foo', '&', 'bar'], '&!', 'baz], '|', 'gar' ],
>> phrases : ['bar'],
>> classname : 'keyword',
>> fields : [15]
>> },
>> { ftsquery : ['qux'],
>> phrases : [],
>> classname : 'title',
>> fields : [6]
>> }
>> ]
>> }
>> ]
>> }
>>
>> Then, we walk that tree (probably a plperlu stored proc) looking for
>> leaf hashes (those that don't contain a key of "query") and build up
>> SELECT (ranking), FROM (sourcing) and WHERE (tsquery and phrase
>> matching) clauses as we see them. We apply the union of the index
>> normalizers defined for the referenced fields to the non-joiner atoms
>> in the "ftsquery" structure.
>>
>> Eh? I can see my way from here to there, but what I'm I totally overlooking?
>>
>
> Since I didn't hear any screaming, I took a little time today to put
> together a rough parser (attached). I have not thrown horrible,
> terrible, mean, angry data at it, but for good input and some bits of
> unexpected input (multiple boolean operators in a row, unbalanced
> parens) it works as I expect it to.
>
> The output format is not exactly the same as described above, but
> should be recognizable.
>
> One of the implications of moving to this is that the "compiled query"
> returned from the main search call will necessarily change. Also note
> that reconstructing the exact query that the user supplied will be
> very difficult, but we can cache that data easily enough for later
> use.
>
> The command line I've been testing with is:
>
> ./fts-replacement.pl 'title: (foo bar) || (-baz || (subject:"1900-1910
> junk" se:stuff)) && && && au:malarky || au:gonzo && +goo' 1
>
> First param is the query, second is a debug flag. || == OR, and && ==
> AND, obviously.
>
> Any feedback would be appreciated.
>
> --
> Mike Rylander
> | VP, Research and Design
> | Equinox Software, Inc. / The Evergreen Experts
> | phone: 1-877-OPEN-ILS (673-6457)
> | email: miker at esilibrary.com
> | web: http://www.esilibrary.com
>
--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker at esilibrary.com
| web: http://www.esilibrary.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fts-replacement.pl
Type: application/x-perl
Size: 14717 bytes
Desc: not available
Url : http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20100211/c3b4347f/attachment-0001.bin
More information about the Open-ils-dev
mailing list