[OPEN-ILS-DEV] Questions About Query Parser

Liam Whalen whalen.ld at gmail.com
Wed Nov 13 21:49:03 EST 2013


Mike,

I am not familiar with what the values pasted into decompose look like, so I am not qualified to comment on all the removals of local right now.

I am not certain if the local issue is the problem in case 2, but I do not understand the code.

From what I've seen, I think the new atom_re variable is preventing what used to be empty atoms (using the old re) from being added to the query, which also prevents the struct->joiner from being changed to '&'.

As for removing local, I see that last_type is set twice in the atom_re if statement (line 1188).  I have not looked at the decompose function much, but I do not see where a call is made that would trigger another recursion between the two assignments to last_type within that if statement.  Understanding that will help me understand a part of this recursion.

In the end, I cannot be much help with vetting this yet.  I need to look at what the data is when it is passed in before I can really understand what is going on.  I see there is a debug flag.  I will set that on the weekend and take a closer look if someone else hasn't verified your changes by then.

Liam


On Nov 13, 2013, at 5:53 PM, Mike Rylander <mrylander at gmail.com> wrote:

> Liam,
> 
> Our emails crossed in the tubes!  You're right that the joiner was
> wrong, but it's much earlier than in the SQL generation.  If you have
> the time, please take a look at my branch (in particular, the removal
> of 'local' before most occurences of $last_type) and see if you see
> what I see ... that is, the right query is generated for Case 2.
> 
> --miker
> 
> 
> On Wed, Nov 13, 2013 at 8:49 PM, Liam Whalen <whalen.ld at gmail.com> wrote:
>> I have tracked down the problem for Case 2, but I have not figured out how
>> to solve it.
>> 
>> When I conduct a query using three ISBNs, I get the following WITH
>> *_keyword_xq AS clause:
>> 
>> WITH xa474860_keyword_xq AS (SELECT
>> 
>>              (to_tsquery('simple', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')))
>> 
>>         AS tsq,
>>              (to_tsquery('simple', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'085392844722')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), ''))) AS tsq_rank ),
>>             xa47d860_keyword_xq AS (SELECT
>> 
>>              (to_tsquery('simple', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')))
>> 
>>        &&
>> 
>>              (to_tsquery('simple', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')))
>> 
>>         AS tsq,
>>              (to_tsquery('simple', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'1551922460')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), ''))) ||
>>              (to_tsquery('simple', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), '')) || to_tsquery('english_nostop', COALESCE(NULLIF( '(' ||
>> btrim(regexp_replace(search_normalize(split_date_range(E'9780786222742')),E'(?:\\s+|:)','&','g'),'&|')
>> || ')', '()'), ''))) AS tsq_rank )
>> 
>> I had the /Driver/Pg/QueryParser.pm file inject extra newlines into the
>> creation with the following line:
>> 
>> From master:
>> 
>> $self->{tsquery} .= "\n\n" . ${spc} x 3 . $atom->sql . "\n\n";
>> 
>> The introduction of the && is what is causing the problem.
>> 
>> If I modify Storage/QueryParser.pm at line 1205, which sets the
>> $struct->joiner to &
>> 
>> and change it to |. With this change, the query works properly.
>> 
>> 1205                 $struct->joiner( '|' );
>> 
>> I have added a quick commit to my user branch if someone wants to pick up
>> from here:
>> 
>> http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/ldw/query_parser_three_isbn_problem
>> 
>> Judging from the code in Storage/QueryParser.pm, it looks like there are
>> empty atoms being added when the query is being parsed when there are three
>> ISBNs.  That is my best guess, but I have not looked at that to confirm it.
>> 
>> Liam
>> 
>> On Nov 13, 2013, at 12:06 PM, Mike Rylander <mrylander at gmail.com> wrote:
>> 
>> I only have a moment, but I can comment on both cases, at least a little.
>> 
>> * Case 1 is very likely a variant of the problem fixed
>> (superficially) in ab37336715cb7d84ed0c30b1cd2e9e6b85933774 and is not
>> fixed in QP-proper, yet.  The current workaround is "put filters and
>> modifiers at the end".
>> * Case 2 is something new to me.  Could you pull the full query
>> generated by the working and non-working versions out of your logs,
>> per chance? If you're logging long-running queries longer than a
>> second (not a bad idea, IMO), they're likely in there.
>> 
>> --miker
>> 
>> On Wed, Nov 13, 2013 at 2:24 PM, Dan Wells <dbw2 at calvin.edu> wrote:
>> 
>> Hello all,
>> 
>> Having made the leap from 2.3 to 2.5, we have run into a few issues with the
>> updated query parser, and rather than dig into something I know from the
>> surface to be very complex, I am hoping someone might shed some light on
>> what is going on.  The issues I am finding *seem* like bugs, but maybe I am
>> missing something fundamental in my queries.
>> 
>> You see, we have an external system (SFX) which does automated queries of
>> the catalog to find book content.  These queries are largely based on ISBN,
>> and when we upgraded to 2.5, they started to fail in strange ways.  After
>> much poking and prodding, I have boiled things down to a couple simple cases
>> with surprising results.  In both cases, the order of operands changes the
>> result set despite using what I think is a commutative operator (||).  You
>> can test these queries using our current catalog, http://ulysses.calvin.edu/
>> , but I can also take some time to find similar issues using Concerto
>> records if it comes to that.
>> 
>> Case 1a:
>> item_form(s) && identifier|isbn:0830837035 || identifier|isbn:1844743829 (no
>> results)
>> vs.
>> identifier|isbn:1844743829 || identifier|isbn:0830837035 && item_form(s) (1
>> result)
>> 
>> In this case, I would have expected both to return 1 result.  I also see the
>> same behavior even if the given ISBN is identical (a contrived example):
>> 
>> Case 1b:
>> item_form(s) && identifier|isbn:0830837035 || identifier|isbn:0830837035 (no
>> results)
>> vs.
>> identifier|isbn:0830837035 || identifier|isbn:0830837035 && item_form(s) (1
>> result)
>> 
>> 
>> 
>> The next case is similar, but with slightly more nuance.  I have two of the
>> same title, one print, one electronic.  If I OR the ISBNs together, it
>> works:
>> 
>> Case 2a:
>> identifier|isbn:074944990X || identifier|isbn:0749452897 -- WORKS
>> 
>> identifier|isbn:0749452897 || identifier|isbn:074944990X – WORKS
>> 
>> However, if I add a third ISBN to the mix, I now get different results
>> depending on the order of operands:
>> 
>> Case2b:
>> identifier|isbn:074944990X || identifier|isbn:0749452897 ||
>> identifier|isbn:7313054289 -- DOESN'T WORK (print result)
>> 
>> identifier|isbn:074944990X || identifier|isbn:7313054289 ||
>> identifier|isbn:0749452897 -- DOESN'T WORK (print result)
>> 
>> identifier|isbn:7313054289 || identifier|isbn:074944990X ||
>> identifier|isbn:0749452897 -- DOESN'T WORK (no results)
>> 
>> identifier|isbn:7313054289 || identifier|isbn:0749452897 ||
>> identifier|isbn:074944990X -- DOESN'T WORK (no results)
>> 
>> identifier|isbn:0749452897 || identifier|isbn:074944990X ||
>> identifier|isbn:7313054289 -- DOESN'T WORK (e result)
>> 
>> identifier|isbn:0749452897 || identifier|isbn:7313054289 ||
>> identifier|isbn:074944990X -- DOESN'T WORK (e result)
>> 
>> I do seem to get the same behavior when using a more compact query notation
>> (which I believe should be identical in effect):
>> 
>> Case 2c:
>> identifier|isbn:(074944990X || 0749452897) -- WORKS
>> 
>> identifier|isbn:(074944990X || 0749452897 || 7313054289) -- DOESN'T WORK
>> (print result)
>> 
>> identifier|isbn:(0749452897 || 074944990X || 7313054289) -- DOESN'T WORK (e
>> result)
>> 
>> 
>> 
>> Based on when development in query parsing was most active, I imagine this
>> behavior has existed since 2.4.  Can anyone verify that?  Also, is there an
>> explanation for this behavior which I may be missing?  If not, can anyone
>> more familiar with this code at least narrow down what is causing these
>> issues?  I’m willing to dive in if necessary, but given the complexity of
>> this code, I may not soon have enough free time to effectively troubleshoot
>> this.
>> 
>> Finally, I am happy to move the conversation over to LP if that is a better
>> venue, but I was struggling with pinpointing exactly what this bug affects
>> (and therefore how to file it properly), so I thought I would first seek
>> input from the list.
>> 
>> Thanks,
>> Dan
>> 
>> 
>> Daniel Wells
>> Library Programmer/Analyst
>> Hekman Library, Calvin College
>> 616.526.7133
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Mike Rylander
>> | Director of Research and Development
>> | Equinox Software, Inc. / Your Library's Guide to Open Source
>> | phone:  1-877-OPEN-ILS (673-6457)
>> | email:  miker at esilibrary.com
>> | web:  http://www.esilibrary.com
>> 
>> 
> 
> 
> 
> -- 
> Mike Rylander
> | Director of Research and Development
> | Equinox Software, Inc. / Your Library's Guide to Open Source
> | phone:  1-877-OPEN-ILS (673-6457)
> | email:  miker at esilibrary.com
> | web:  http://www.esilibrary.com



More information about the Open-ils-dev mailing list