[OPEN-ILS-DEV] ***SPAM*** Re: ***SPAM*** Re: Napkin drawings for removing OpenILS::Application::Storage::FTS

Mike Rylander mrylander at gmail.com
Thu Feb 11 14:32:51 EST 2010


What's the definition of insanity, again?  ;)

Here's round 2 of a possible new query parser.  In this version we
support configurable search classes, fields, modifiers and filters.
I'll try to construct a BNF for it, but by way of example the gist is:

Classed search:
   title: harry potter
   keyword: water

Specific field (index definition) search:
   author|personal: rowling
   subject|name: potter

Search modifiers
   #metabib
   #available
   #descending

Argumented filters:
   statuses(0,7,12)
   estimation_strategy(exclusion)
   item_form(d)
   during(2009,2010)

Phrases (non-stemming matches):
   "multi-word phrase (exact) matches"
   +one +word +exact +matches

Grouping and Boolean operators (chocolate and peanut butter):
   (foo && bar) || baz

Everything but the grouping, boolean and phrase operators is
configurable, so adding classes and fields, modifiers and filters is
up to (available to?) the developer.  Some examples:

    __PACKAGE__->add_search_class( 'keyword' );
    __PACKAGE__->add_search_class( 'title' );
    __PACKAGE__->add_search_class( 'author' );
    __PACKAGE__->add_search_class( 'subject' );
    __PACKAGE__->add_search_class( 'series' );

    __PACKAGE__->add_search_field( author => 'corporate' );

    __PACKAGE__->add_search_filter( 'audience' );
    __PACKAGE__->add_search_filter( 'vr_format' );
    __PACKAGE__->add_search_filter( 'format' );
    __PACKAGE__->add_search_filter( 'item_type' );
    __PACKAGE__->add_search_filter( 'item_form' );
    __PACKAGE__->add_search_filter( 'lit_form' );
    __PACKAGE__->add_search_filter( 'location' );

    __PACKAGE__->add_search_modifier( 'available' );
    __PACKAGE__->add_search_modifier( 'descending' );
    __PACKAGE__->add_search_modifier( 'ascending' );
    __PACKAGE__->add_search_modifier( 'metarecord' );

This also supports both class-wide and class+field aliasing via
regexp, for use in mapping CQL relations to evergreen search classes
and fields:

    __PACKAGE__->add_search_class_alias( author => 'name' );
    __PACKAGE__->add_search_class_alias( author => 'dc.contributor' );

    __PACKAGE__->add_search_class_alias( subject =>
'bib.subject(?:Title|Place|Occupation)' );
    __PACKAGE__->add_search_field_alias( subject => name => 'bib.subjectName' );
    __PACKAGE__->add_search_field_alias( keyword => standard_number =>
'dc.identifier' );  # keyword|standard_number isn't a stock evergreen
index def, jfyi

Anyway, as before, feedback appreciated.

--miker

On Wed, Feb 3, 2010 at 4:55 PM, Mike Rylander <mrylander at gmail.com> wrote:
> On Thu, Jan 21, 2010 at 3:54 PM, Mike Rylander <mrylander at gmail.com> wrote:
>> Search syntax thought experiment
>> --------------------------------------------------
>>
>> Multi-stage search-to-query compilation
>>
>> Given a default search class of "keyword" and a search string of:
>>  ( foo "bar" -baz || gar || ti|proper:qux) && au:junk
>>
>>
>> Stage 1: Boolean decomposition
>>
>> { bool : 'and',
>>  query : [
>>    { author : 'junk' },
>>    { bool : 'or',
>>      query : [
>>        { keyword : 'foo "bar" -baz || gar' },
>>        { 'title|proper' : 'qux' }
>>      ]
>>    }
>>  ]
>> }
>>
>> Lacking grouping parens, we bind pairs of || or && separated atoms
>> (assuming &&), working from left to right.  If adjacent atoms belong
>> to the same class[|field] specifier, they are folded into a single
>> leaf.  Atoms are whitespace separated components not spelled '(', ')',
>> '||' or '&&'.
>>
>>
>>
>> Stage 2: Search decomposition
>>
>> { bool : 'and',
>>  query : [
>>    { ftsquery : ['junk'],
>>      phrases : [],
>>      classname : 'author',
>>      fields : [7, 8, 9, 10]
>>    },
>>    { bool : 'or',
>>      query : [
>>        { ftsquery : [[['foo', '&', 'bar'], '&!', 'baz], '|', 'gar' ],
>>          phrases : ['bar'],
>>          classname : 'keyword',
>>          fields : [15]
>>        },
>>        { ftsquery : ['qux'],
>>          phrases : [],
>>          classname : 'title',
>>          fields : [6]
>>        }
>>      ]
>>    }
>>  ]
>> }
>>
>> Then, we walk that tree (probably a plperlu stored proc) looking for
>> leaf hashes (those that don't contain a key of "query") and build up
>> SELECT (ranking), FROM (sourcing) and WHERE (tsquery and phrase
>> matching) clauses as we see them.  We apply the union of the index
>> normalizers defined for the referenced fields to the non-joiner atoms
>> in the "ftsquery" structure.
>>
>> Eh?  I can see my way from here to there, but what I'm I totally overlooking?
>>
>
> Since I didn't hear any screaming, I took a little time today to put
> together a rough parser (attached).  I have not thrown horrible,
> terrible, mean, angry data at it, but for good input and some bits of
> unexpected input (multiple boolean operators in a row, unbalanced
> parens) it works as I expect it to.
>
> The output format is not exactly the same as described above, but
> should be recognizable.
>
> One of the implications of moving to this is that the "compiled query"
> returned from the main search call will necessarily change.  Also note
> that reconstructing the exact query that the user supplied will be
> very difficult, but we can cache that data easily enough for later
> use.
>
> The command line I've been testing with is:
>
> ./fts-replacement.pl 'title: (foo bar) || (-baz || (subject:"1900-1910
> junk" se:stuff)) && && && au:malarky || au:gonzo && +goo' 1
>
> First param is the query, second is a debug flag.  || == OR, and && ==
> AND, obviously.
>
> Any feedback would be appreciated.
>
> --
> Mike Rylander
>  | VP, Research and Design
>  | Equinox Software, Inc. / The Evergreen Experts
>  | phone:  1-877-OPEN-ILS (673-6457)
>  | email:  miker at esilibrary.com
>  | web:  http://www.esilibrary.com
>



-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fts-replacement.pl
Type: application/x-perl
Size: 14717 bytes
Desc: not available
Url : http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20100211/c3b4347f/attachment-0001.bin 


More information about the Open-ils-dev mailing list