[OPEN-ILS-DEV] ***SPAM*** Re: Napkin drawings for removing OpenILS::Application::Storage::FTS

Mike Rylander mrylander at gmail.com
Wed Feb 3 16:55:57 EST 2010


On Thu, Jan 21, 2010 at 3:54 PM, Mike Rylander <mrylander at gmail.com> wrote:
> Search syntax thought experiment
> --------------------------------------------------
>
> Multi-stage search-to-query compilation
>
> Given a default search class of "keyword" and a search string of:
>  ( foo "bar" -baz || gar || ti|proper:qux) && au:junk
>
>
> Stage 1: Boolean decomposition
>
> { bool : 'and',
>  query : [
>    { author : 'junk' },
>    { bool : 'or',
>      query : [
>        { keyword : 'foo "bar" -baz || gar' },
>        { 'title|proper' : 'qux' }
>      ]
>    }
>  ]
> }
>
> Lacking grouping parens, we bind pairs of || or && separated atoms
> (assuming &&), working from left to right.  If adjacent atoms belong
> to the same class[|field] specifier, they are folded into a single
> leaf.  Atoms are whitespace separated components not spelled '(', ')',
> '||' or '&&'.
>
>
>
> Stage 2: Search decomposition
>
> { bool : 'and',
>  query : [
>    { ftsquery : ['junk'],
>      phrases : [],
>      classname : 'author',
>      fields : [7, 8, 9, 10]
>    },
>    { bool : 'or',
>      query : [
>        { ftsquery : [[['foo', '&', 'bar'], '&!', 'baz], '|', 'gar' ],
>          phrases : ['bar'],
>          classname : 'keyword',
>          fields : [15]
>        },
>        { ftsquery : ['qux'],
>          phrases : [],
>          classname : 'title',
>          fields : [6]
>        }
>      ]
>    }
>  ]
> }
>
> Then, we walk that tree (probably a plperlu stored proc) looking for
> leaf hashes (those that don't contain a key of "query") and build up
> SELECT (ranking), FROM (sourcing) and WHERE (tsquery and phrase
> matching) clauses as we see them.  We apply the union of the index
> normalizers defined for the referenced fields to the non-joiner atoms
> in the "ftsquery" structure.
>
> Eh?  I can see my way from here to there, but what I'm I totally overlooking?
>

Since I didn't hear any screaming, I took a little time today to put
together a rough parser (attached).  I have not thrown horrible,
terrible, mean, angry data at it, but for good input and some bits of
unexpected input (multiple boolean operators in a row, unbalanced
parens) it works as I expect it to.

The output format is not exactly the same as described above, but
should be recognizable.

One of the implications of moving to this is that the "compiled query"
returned from the main search call will necessarily change.  Also note
that reconstructing the exact query that the user supplied will be
very difficult, but we can cache that data easily enough for later
use.

The command line I've been testing with is:

./fts-replacement.pl 'title: (foo bar) || (-baz || (subject:"1900-1910
junk" se:stuff)) && && && au:malarky || au:gonzo && +goo' 1

First param is the query, second is a debug flag.  || == OR, and && ==
AND, obviously.

Any feedback would be appreciated.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fts-replacement.pl
Type: text/x-perl
Size: 5540 bytes
Desc: not available
Url : http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20100203/95d73ac7/attachment.pl 


More information about the Open-ils-dev mailing list