[OPEN-ILS-DEV] ***SPAM*** Re: ***SPAM*** Re: ***SPAM*** Re: ***SPAM*** Re: Napkin drawings for removing OpenILS::Application::Storage::FTS

Mike Rylander mrylander at gmail.com
Wed Mar 31 15:07:10 EDT 2010


On Wed, Mar 31, 2010 at 11:48 AM, Mike Rylander <mrylander at gmail.com> wrote:
> OK ... so, here's an attempt at a grammar for the new
> QueryParser-based search stuff.  The actual tokens used for much of
> the syntax is configurable.  I'll try to point that out as best I can.
>

[snip]

> ------------------------------------------------------------------------------------------
>
> Coming soon -- lots of example queries and a list of configured values
> for the implementation class used in EG.  I'll do all that on the
> wiki, and add the grammar, too.

And ... http://open-ils.org/dokuwiki/doku.php?id=documentation:technical:search_grammar

--miker

>
> --miker
>
> On Tue, Mar 9, 2010 at 2:52 PM, Mike Rylander <mrylander at gmail.com> wrote:
>> So, here's some more updatification.  No BNF yet, but everything
>> mentioned below still works as described.  Also, you can now add more
>> than one field to a specific index definition search, thusly:
>>
>> title|proper|uniform|translated: harry potter
>>
>> and the query built will search all of those index definitions.
>>
>> You'll find attached 3 files:
>>
>>  * QueryParser.pm -- the core parser and query tree building code
>>  * EG_QueryParser.pm -- an Evergreen-specific subclass which adds
>> tsearch2 logic, SQL generation, and code which populates the parser in
>> the right way for Evergreen by filling in classes, fields, modifiers
>> and functions that we need
>>  * fts-replacement.pl -- a script to test the above modules
>>
>> My test command line is currently:
>>
>> $ time PERL5LIB=$PERL5LIB:/home/miker/svn/OpenSRF/src/perl/lib/
>> ./fts-replacement.pl --query='ti:jk harry kw:rowling
>> sort(title)#descending'
>>
>> and you can adjust the PERL5LIB and query to taste (or remove it to
>> see the redonkulous default test query).  We just need
>> OpenSRF::Utils::JSON ... hence the PERL5LIB mangling.  If you have
>> OpenSRF installed, no need for that bit
>>
>> Thought or comments welcome.
>>
>> --miker
>>
>> On Thu, Feb 11, 2010 at 2:32 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>> What's the definition of insanity, again?  ;)
>>>
>>> Here's round 2 of a possible new query parser.  In this version we
>>> support configurable search classes, fields, modifiers and filters.
>>> I'll try to construct a BNF for it, but by way of example the gist is:
>>>
>>> Classed search:
>>>   title: harry potter
>>>   keyword: water
>>>
>>> Specific field (index definition) search:
>>>   author|personal: rowling
>>>   subject|name: potter
>>>
>>> Search modifiers
>>>   #metabib
>>>   #available
>>>   #descending
>>>
>>> Argumented filters:
>>>   statuses(0,7,12)
>>>   estimation_strategy(exclusion)
>>>   item_form(d)
>>>   during(2009,2010)
>>>
>>> Phrases (non-stemming matches):
>>>   "multi-word phrase (exact) matches"
>>>   +one +word +exact +matches
>>>
>>> Grouping and Boolean operators (chocolate and peanut butter):
>>>   (foo && bar) || baz
>>>
>>> Everything but the grouping, boolean and phrase operators is
>>> configurable, so adding classes and fields, modifiers and filters is
>>> up to (available to?) the developer.  Some examples:
>>>
>>>    __PACKAGE__->add_search_class( 'keyword' );
>>>    __PACKAGE__->add_search_class( 'title' );
>>>    __PACKAGE__->add_search_class( 'author' );
>>>    __PACKAGE__->add_search_class( 'subject' );
>>>    __PACKAGE__->add_search_class( 'series' );
>>>
>>>    __PACKAGE__->add_search_field( author => 'corporate' );
>>>
>>>    __PACKAGE__->add_search_filter( 'audience' );
>>>    __PACKAGE__->add_search_filter( 'vr_format' );
>>>    __PACKAGE__->add_search_filter( 'format' );
>>>    __PACKAGE__->add_search_filter( 'item_type' );
>>>    __PACKAGE__->add_search_filter( 'item_form' );
>>>    __PACKAGE__->add_search_filter( 'lit_form' );
>>>    __PACKAGE__->add_search_filter( 'location' );
>>>
>>>    __PACKAGE__->add_search_modifier( 'available' );
>>>    __PACKAGE__->add_search_modifier( 'descending' );
>>>    __PACKAGE__->add_search_modifier( 'ascending' );
>>>    __PACKAGE__->add_search_modifier( 'metarecord' );
>>>
>>> This also supports both class-wide and class+field aliasing via
>>> regexp, for use in mapping CQL relations to evergreen search classes
>>> and fields:
>>>
>>>    __PACKAGE__->add_search_class_alias( author => 'name' );
>>>    __PACKAGE__->add_search_class_alias( author => 'dc.contributor' );
>>>
>>>    __PACKAGE__->add_search_class_alias( subject =>
>>> 'bib.subject(?:Title|Place|Occupation)' );
>>>    __PACKAGE__->add_search_field_alias( subject => name => 'bib.subjectName' );
>>>    __PACKAGE__->add_search_field_alias( keyword => standard_number =>
>>> 'dc.identifier' );  # keyword|standard_number isn't a stock evergreen
>>> index def, jfyi
>>>
>>> Anyway, as before, feedback appreciated.
>>>
>>> --miker
>>>
>>> On Wed, Feb 3, 2010 at 4:55 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>>> On Thu, Jan 21, 2010 at 3:54 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>>>> Search syntax thought experiment
>>>>> --------------------------------------------------
>>>>>
>>>>> Multi-stage search-to-query compilation
>>>>>
>>>>> Given a default search class of "keyword" and a search string of:
>>>>>  ( foo "bar" -baz || gar || ti|proper:qux) && au:junk
>>>>>
>>>>>
>>>>> Stage 1: Boolean decomposition
>>>>>
>>>>> { bool : 'and',
>>>>>  query : [
>>>>>    { author : 'junk' },
>>>>>    { bool : 'or',
>>>>>      query : [
>>>>>        { keyword : 'foo "bar" -baz || gar' },
>>>>>        { 'title|proper' : 'qux' }
>>>>>      ]
>>>>>    }
>>>>>  ]
>>>>> }
>>>>>
>>>>> Lacking grouping parens, we bind pairs of || or && separated atoms
>>>>> (assuming &&), working from left to right.  If adjacent atoms belong
>>>>> to the same class[|field] specifier, they are folded into a single
>>>>> leaf.  Atoms are whitespace separated components not spelled '(', ')',
>>>>> '||' or '&&'.
>>>>>
>>>>>
>>>>>
>>>>> Stage 2: Search decomposition
>>>>>
>>>>> { bool : 'and',
>>>>>  query : [
>>>>>    { ftsquery : ['junk'],
>>>>>      phrases : [],
>>>>>      classname : 'author',
>>>>>      fields : [7, 8, 9, 10]
>>>>>    },
>>>>>    { bool : 'or',
>>>>>      query : [
>>>>>        { ftsquery : [[['foo', '&', 'bar'], '&!', 'baz], '|', 'gar' ],
>>>>>          phrases : ['bar'],
>>>>>          classname : 'keyword',
>>>>>          fields : [15]
>>>>>        },
>>>>>        { ftsquery : ['qux'],
>>>>>          phrases : [],
>>>>>          classname : 'title',
>>>>>          fields : [6]
>>>>>        }
>>>>>      ]
>>>>>    }
>>>>>  ]
>>>>> }
>>>>>
>>>>> Then, we walk that tree (probably a plperlu stored proc) looking for
>>>>> leaf hashes (those that don't contain a key of "query") and build up
>>>>> SELECT (ranking), FROM (sourcing) and WHERE (tsquery and phrase
>>>>> matching) clauses as we see them.  We apply the union of the index
>>>>> normalizers defined for the referenced fields to the non-joiner atoms
>>>>> in the "ftsquery" structure.
>>>>>
>>>>> Eh?  I can see my way from here to there, but what I'm I totally overlooking?
>>>>>
>>>>
>>>> Since I didn't hear any screaming, I took a little time today to put
>>>> together a rough parser (attached).  I have not thrown horrible,
>>>> terrible, mean, angry data at it, but for good input and some bits of
>>>> unexpected input (multiple boolean operators in a row, unbalanced
>>>> parens) it works as I expect it to.
>>>>
>>>> The output format is not exactly the same as described above, but
>>>> should be recognizable.
>>>>
>>>> One of the implications of moving to this is that the "compiled query"
>>>> returned from the main search call will necessarily change.  Also note
>>>> that reconstructing the exact query that the user supplied will be
>>>> very difficult, but we can cache that data easily enough for later
>>>> use.
>>>>
>>>> The command line I've been testing with is:
>>>>
>>>> ./fts-replacement.pl 'title: (foo bar) || (-baz || (subject:"1900-1910
>>>> junk" se:stuff)) && && && au:malarky || au:gonzo && +goo' 1
>>>>
>>>> First param is the query, second is a debug flag.  || == OR, and && ==
>>>> AND, obviously.
>>>>
>>>> Any feedback would be appreciated.
>>>>
>>>> --
>>>> Mike Rylander
>>>>  | VP, Research and Design
>>>>  | Equinox Software, Inc. / The Evergreen Experts
>>>>  | phone:  1-877-OPEN-ILS (673-6457)
>>>>  | email:  miker at esilibrary.com
>>>>  | web:  http://www.esilibrary.com
>>>>
>>>
>>>
>>>
>>> --
>>> Mike Rylander
>>>  | VP, Research and Design
>>>  | Equinox Software, Inc. / The Evergreen Experts
>>>  | phone:  1-877-OPEN-ILS (673-6457)
>>>  | email:  miker at esilibrary.com
>>>  | web:  http://www.esilibrary.com
>>>
>>
>>
>>
>> --
>> Mike Rylander
>>  | VP, Research and Design
>>  | Equinox Software, Inc. / The Evergreen Experts
>>  | phone:  1-877-OPEN-ILS (673-6457)
>>  | email:  miker at esilibrary.com
>>  | web:  http://www.esilibrary.com
>>
>
>
>
> --
> Mike Rylander
> | VP, Research and Design
> | Equinox Software, Inc. / The Evergreen Experts
> | phone:  1-877-OPEN-ILS (673-6457)
> | email:  miker at esilibrary.com
> | web:  http://www.esilibrary.com
>



-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list