[OPEN-ILS-DEV] Questions About Query Parser

Mike Rylander mrylander at gmail.com
Wed Nov 13 20:53:37 EST 2013


Liam,

Our emails crossed in the tubes!  You're right that the joiner was
wrong, but it's much earlier than in the SQL generation.  If you have
the time, please take a look at my branch (in particular, the removal
of 'local' before most occurences of $last_type) and see if you see
what I see ... that is, the right query is generated for Case 2.

--miker


On Wed, Nov 13, 2013 at 8:49 PM, Liam Whalen <whalen.ld at gmail.com> wrote:
> I have tracked down the problem for Case 2, but I have not figured out how
> to solve it.
>
> When I conduct a query using three ISBNs, I get the following WITH
> *_keyword_xq AS clause:
>
> WITH xa474860_keyword_xq AS (SELECT
>
>               (to_tsquery('simple', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')))
>
>          AS tsq,
>               (to_tsquery('simple', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), ''))) AS tsq_rank ),
>              xa47d860_keyword_xq AS (SELECT
>
>               (to_tsquery('simple', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')))
>
>         &&
>
>               (to_tsquery('simple', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')))
>
>          AS tsq,
>               (to_tsquery('simple', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), ''))) ||
>               (to_tsquery('simple', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
> || ')', '()'), ''))) AS tsq_rank )
>
> I had the /Driver/Pg/QueryParser.pm file inject extra newlines into the
> creation with the following line:
>
> From master:
>
> $self->{tsquery} .= "\n\n" . ${spc} x 3 . $atom->sql . "\n\n";
>
> The introduction of the && is what is causing the problem.
>
> If I modify Storage/QueryParser.pm at line 1205, which sets the
> $struct->joiner to &
>
> and change it to |. With this change, the query works properly.
>
> 1205                 $struct->joiner( '|' );
>
> I have added a quick commit to my user branch if someone wants to pick up
> from here:
>
> http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/ldw/query_parser_three_isbn_problem
>
> Judging from the code in Storage/QueryParser.pm, it looks like there are
> empty atoms being added when the query is being parsed when there are three
> ISBNs.  That is my best guess, but I have not looked at that to confirm it.
>
> Liam
>
> On Nov 13, 2013, at 12:06 PM, Mike Rylander <mrylander at gmail.com> wrote:
>
> I only have a moment, but I can comment on both cases, at least a little.
>
> * Case 1 is very likely a variant of the problem fixed
> (superficially) in ab37336715cb7d84ed0c30b1cd2e9e6b85933774 and is not
> fixed in QP-proper, yet.  The current workaround is "put filters and
> modifiers at the end".
> * Case 2 is something new to me.  Could you pull the full query
> generated by the working and non-working versions out of your logs,
> per chance? If you're logging long-running queries longer than a
> second (not a bad idea, IMO), they're likely in there.
>
> --miker
>
> On Wed, Nov 13, 2013 at 2:24 PM, Dan Wells <dbw2 at calvin.edu> wrote:
>
> Hello all,
>
> Having made the leap from 2.3 to 2.5, we have run into a few issues with the
> updated query parser, and rather than dig into something I know from the
> surface to be very complex, I am hoping someone might shed some light on
> what is going on.  The issues I am finding *seem* like bugs, but maybe I am
> missing something fundamental in my queries.
>
> You see, we have an external system (SFX) which does automated queries of
> the catalog to find book content.  These queries are largely based on ISBN,
> and when we upgraded to 2.5, they started to fail in strange ways.  After
> much poking and prodding, I have boiled things down to a couple simple cases
> with surprising results.  In both cases, the order of operands changes the
> result set despite using what I think is a commutative operator (||).  You
> can test these queries using our current catalog, http://ulysses.calvin.edu/
> , but I can also take some time to find similar issues using Concerto
> records if it comes to that.
>
> Case 1a:
> item_form(s) && identifier|isbn:0830837035 || identifier|isbn:1844743829 (no
> results)
> vs.
> identifier|isbn:1844743829 || identifier|isbn:0830837035 && item_form(s) (1
> result)
>
> In this case, I would have expected both to return 1 result.  I also see the
> same behavior even if the given ISBN is identical (a contrived example):
>
> Case 1b:
> item_form(s) && identifier|isbn:0830837035 || identifier|isbn:0830837035 (no
> results)
> vs.
> identifier|isbn:0830837035 || identifier|isbn:0830837035 && item_form(s) (1
> result)
>
>
>
> The next case is similar, but with slightly more nuance.  I have two of the
> same title, one print, one electronic.  If I OR the ISBNs together, it
> works:
>
> Case 2a:
> identifier|isbn:074944990X || identifier|isbn:0749452897 -- WORKS
>
> identifier|isbn:0749452897 || identifier|isbn:074944990X – WORKS
>
> However, if I add a third ISBN to the mix, I now get different results
> depending on the order of operands:
>
> Case2b:
> identifier|isbn:074944990X || identifier|isbn:0749452897 ||
> identifier|isbn:7313054289 -- DOESN'T WORK (print result)
>
> identifier|isbn:074944990X || identifier|isbn:7313054289 ||
> identifier|isbn:0749452897 -- DOESN'T WORK (print result)
>
> identifier|isbn:7313054289 || identifier|isbn:074944990X ||
> identifier|isbn:0749452897 -- DOESN'T WORK (no results)
>
> identifier|isbn:7313054289 || identifier|isbn:0749452897 ||
> identifier|isbn:074944990X -- DOESN'T WORK (no results)
>
> identifier|isbn:0749452897 || identifier|isbn:074944990X ||
> identifier|isbn:7313054289 -- DOESN'T WORK (e result)
>
> identifier|isbn:0749452897 || identifier|isbn:7313054289 ||
> identifier|isbn:074944990X -- DOESN'T WORK (e result)
>
> I do seem to get the same behavior when using a more compact query notation
> (which I believe should be identical in effect):
>
> Case 2c:
> identifier|isbn:(074944990X || 0749452897) -- WORKS
>
> identifier|isbn:(074944990X || 0749452897 || 7313054289) -- DOESN'T WORK
> (print result)
>
> identifier|isbn:(0749452897 || 074944990X || 7313054289) -- DOESN'T WORK (e
> result)
>
>
>
> Based on when development in query parsing was most active, I imagine this
> behavior has existed since 2.4.  Can anyone verify that?  Also, is there an
> explanation for this behavior which I may be missing?  If not, can anyone
> more familiar with this code at least narrow down what is causing these
> issues?  I’m willing to dive in if necessary, but given the complexity of
> this code, I may not soon have enough free time to effectively troubleshoot
> this.
>
> Finally, I am happy to move the conversation over to LP if that is a better
> venue, but I was struggling with pinpointing exactly what this bug affects
> (and therefore how to file it properly), so I thought I would first seek
> input from the list.
>
> Thanks,
> Dan
>
>
> Daniel Wells
> Library Programmer/Analyst
> Hekman Library, Calvin College
> 616.526.7133
>
>
>
>
>
>
> --
> Mike Rylander
> | Director of Research and Development
> | Equinox Software, Inc. / Your Library's Guide to Open Source
> | phone:  1-877-OPEN-ILS (673-6457)
> | email:  miker at esilibrary.com
> | web:  http://www.esilibrary.com
>
>



-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list