[OPEN-ILS-DEV] ***SPAM*** Re: ***SPAM*** Re: ***SPAM*** Re: ***SPAM*** Re: Napkin drawings for removing OpenILS::Application::Storage::FTS
Mike Rylander
mrylander at gmail.com
Wed Mar 31 15:07:10 EDT 2010
On Wed, Mar 31, 2010 at 11:48 AM, Mike Rylander <mrylander at gmail.com> wrote:
> OK ... so, here's an attempt at a grammar for the new
> QueryParser-based search stuff. The actual tokens used for much of
> the syntax is configurable. I'll try to point that out as best I can.
>
[snip]
> ------------------------------------------------------------------------------------------
>
> Coming soon -- lots of example queries and a list of configured values
> for the implementation class used in EG. I'll do all that on the
> wiki, and add the grammar, too.
And ... http://open-ils.org/dokuwiki/doku.php?id=documentation:technical:search_grammar
--miker
>
> --miker
>
> On Tue, Mar 9, 2010 at 2:52 PM, Mike Rylander <mrylander at gmail.com> wrote:
>> So, here's some more updatification. No BNF yet, but everything
>> mentioned below still works as described. Also, you can now add more
>> than one field to a specific index definition search, thusly:
>>
>> title|proper|uniform|translated: harry potter
>>
>> and the query built will search all of those index definitions.
>>
>> You'll find attached 3 files:
>>
>> * QueryParser.pm -- the core parser and query tree building code
>> * EG_QueryParser.pm -- an Evergreen-specific subclass which adds
>> tsearch2 logic, SQL generation, and code which populates the parser in
>> the right way for Evergreen by filling in classes, fields, modifiers
>> and functions that we need
>> * fts-replacement.pl -- a script to test the above modules
>>
>> My test command line is currently:
>>
>> $ time PERL5LIB=$PERL5LIB:/home/miker/svn/OpenSRF/src/perl/lib/
>> ./fts-replacement.pl --query='ti:jk harry kw:rowling
>> sort(title)#descending'
>>
>> and you can adjust the PERL5LIB and query to taste (or remove it to
>> see the redonkulous default test query). We just need
>> OpenSRF::Utils::JSON ... hence the PERL5LIB mangling. If you have
>> OpenSRF installed, no need for that bit
>>
>> Thought or comments welcome.
>>
>> --miker
>>
>> On Thu, Feb 11, 2010 at 2:32 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>> What's the definition of insanity, again? ;)
>>>
>>> Here's round 2 of a possible new query parser. In this version we
>>> support configurable search classes, fields, modifiers and filters.
>>> I'll try to construct a BNF for it, but by way of example the gist is:
>>>
>>> Classed search:
>>> title: harry potter
>>> keyword: water
>>>
>>> Specific field (index definition) search:
>>> author|personal: rowling
>>> subject|name: potter
>>>
>>> Search modifiers
>>> #metabib
>>> #available
>>> #descending
>>>
>>> Argumented filters:
>>> statuses(0,7,12)
>>> estimation_strategy(exclusion)
>>> item_form(d)
>>> during(2009,2010)
>>>
>>> Phrases (non-stemming matches):
>>> "multi-word phrase (exact) matches"
>>> +one +word +exact +matches
>>>
>>> Grouping and Boolean operators (chocolate and peanut butter):
>>> (foo && bar) || baz
>>>
>>> Everything but the grouping, boolean and phrase operators is
>>> configurable, so adding classes and fields, modifiers and filters is
>>> up to (available to?) the developer. Some examples:
>>>
>>> __PACKAGE__->add_search_class( 'keyword' );
>>> __PACKAGE__->add_search_class( 'title' );
>>> __PACKAGE__->add_search_class( 'author' );
>>> __PACKAGE__->add_search_class( 'subject' );
>>> __PACKAGE__->add_search_class( 'series' );
>>>
>>> __PACKAGE__->add_search_field( author => 'corporate' );
>>>
>>> __PACKAGE__->add_search_filter( 'audience' );
>>> __PACKAGE__->add_search_filter( 'vr_format' );
>>> __PACKAGE__->add_search_filter( 'format' );
>>> __PACKAGE__->add_search_filter( 'item_type' );
>>> __PACKAGE__->add_search_filter( 'item_form' );
>>> __PACKAGE__->add_search_filter( 'lit_form' );
>>> __PACKAGE__->add_search_filter( 'location' );
>>>
>>> __PACKAGE__->add_search_modifier( 'available' );
>>> __PACKAGE__->add_search_modifier( 'descending' );
>>> __PACKAGE__->add_search_modifier( 'ascending' );
>>> __PACKAGE__->add_search_modifier( 'metarecord' );
>>>
>>> This also supports both class-wide and class+field aliasing via
>>> regexp, for use in mapping CQL relations to evergreen search classes
>>> and fields:
>>>
>>> __PACKAGE__->add_search_class_alias( author => 'name' );
>>> __PACKAGE__->add_search_class_alias( author => 'dc.contributor' );
>>>
>>> __PACKAGE__->add_search_class_alias( subject =>
>>> 'bib.subject(?:Title|Place|Occupation)' );
>>> __PACKAGE__->add_search_field_alias( subject => name => 'bib.subjectName' );
>>> __PACKAGE__->add_search_field_alias( keyword => standard_number =>
>>> 'dc.identifier' ); # keyword|standard_number isn't a stock evergreen
>>> index def, jfyi
>>>
>>> Anyway, as before, feedback appreciated.
>>>
>>> --miker
>>>
>>> On Wed, Feb 3, 2010 at 4:55 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>>> On Thu, Jan 21, 2010 at 3:54 PM, Mike Rylander <mrylander at gmail.com> wrote:
>>>>> Search syntax thought experiment
>>>>> --------------------------------------------------
>>>>>
>>>>> Multi-stage search-to-query compilation
>>>>>
>>>>> Given a default search class of "keyword" and a search string of:
>>>>> ( foo "bar" -baz || gar || ti|proper:qux) && au:junk
>>>>>
>>>>>
>>>>> Stage 1: Boolean decomposition
>>>>>
>>>>> { bool : 'and',
>>>>> query : [
>>>>> { author : 'junk' },
>>>>> { bool : 'or',
>>>>> query : [
>>>>> { keyword : 'foo "bar" -baz || gar' },
>>>>> { 'title|proper' : 'qux' }
>>>>> ]
>>>>> }
>>>>> ]
>>>>> }
>>>>>
>>>>> Lacking grouping parens, we bind pairs of || or && separated atoms
>>>>> (assuming &&), working from left to right. If adjacent atoms belong
>>>>> to the same class[|field] specifier, they are folded into a single
>>>>> leaf. Atoms are whitespace separated components not spelled '(', ')',
>>>>> '||' or '&&'.
>>>>>
>>>>>
>>>>>
>>>>> Stage 2: Search decomposition
>>>>>
>>>>> { bool : 'and',
>>>>> query : [
>>>>> { ftsquery : ['junk'],
>>>>> phrases : [],
>>>>> classname : 'author',
>>>>> fields : [7, 8, 9, 10]
>>>>> },
>>>>> { bool : 'or',
>>>>> query : [
>>>>> { ftsquery : [[['foo', '&', 'bar'], '&!', 'baz], '|', 'gar' ],
>>>>> phrases : ['bar'],
>>>>> classname : 'keyword',
>>>>> fields : [15]
>>>>> },
>>>>> { ftsquery : ['qux'],
>>>>> phrases : [],
>>>>> classname : 'title',
>>>>> fields : [6]
>>>>> }
>>>>> ]
>>>>> }
>>>>> ]
>>>>> }
>>>>>
>>>>> Then, we walk that tree (probably a plperlu stored proc) looking for
>>>>> leaf hashes (those that don't contain a key of "query") and build up
>>>>> SELECT (ranking), FROM (sourcing) and WHERE (tsquery and phrase
>>>>> matching) clauses as we see them. We apply the union of the index
>>>>> normalizers defined for the referenced fields to the non-joiner atoms
>>>>> in the "ftsquery" structure.
>>>>>
>>>>> Eh? I can see my way from here to there, but what I'm I totally overlooking?
>>>>>
>>>>
>>>> Since I didn't hear any screaming, I took a little time today to put
>>>> together a rough parser (attached). I have not thrown horrible,
>>>> terrible, mean, angry data at it, but for good input and some bits of
>>>> unexpected input (multiple boolean operators in a row, unbalanced
>>>> parens) it works as I expect it to.
>>>>
>>>> The output format is not exactly the same as described above, but
>>>> should be recognizable.
>>>>
>>>> One of the implications of moving to this is that the "compiled query"
>>>> returned from the main search call will necessarily change. Also note
>>>> that reconstructing the exact query that the user supplied will be
>>>> very difficult, but we can cache that data easily enough for later
>>>> use.
>>>>
>>>> The command line I've been testing with is:
>>>>
>>>> ./fts-replacement.pl 'title: (foo bar) || (-baz || (subject:"1900-1910
>>>> junk" se:stuff)) && && && au:malarky || au:gonzo && +goo' 1
>>>>
>>>> First param is the query, second is a debug flag. || == OR, and && ==
>>>> AND, obviously.
>>>>
>>>> Any feedback would be appreciated.
>>>>
>>>> --
>>>> Mike Rylander
>>>> | VP, Research and Design
>>>> | Equinox Software, Inc. / The Evergreen Experts
>>>> | phone: 1-877-OPEN-ILS (673-6457)
>>>> | email: miker at esilibrary.com
>>>> | web: http://www.esilibrary.com
>>>>
>>>
>>>
>>>
>>> --
>>> Mike Rylander
>>> | VP, Research and Design
>>> | Equinox Software, Inc. / The Evergreen Experts
>>> | phone: 1-877-OPEN-ILS (673-6457)
>>> | email: miker at esilibrary.com
>>> | web: http://www.esilibrary.com
>>>
>>
>>
>>
>> --
>> Mike Rylander
>> | VP, Research and Design
>> | Equinox Software, Inc. / The Evergreen Experts
>> | phone: 1-877-OPEN-ILS (673-6457)
>> | email: miker at esilibrary.com
>> | web: http://www.esilibrary.com
>>
>
>
>
> --
> Mike Rylander
> | VP, Research and Design
> | Equinox Software, Inc. / The Evergreen Experts
> | phone: 1-877-OPEN-ILS (673-6457)
> | email: miker at esilibrary.com
> | web: http://www.esilibrary.com
>
--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker at esilibrary.com
| web: http://www.esilibrary.com
More information about the Open-ils-dev
mailing list