[OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Linda Jansova
skolkova at chello.cz
Mon Aug 31 03:15:28 EDT 2015
Hi,
I have also tried the SRU search (using yaz-client) and tried some CQL
queries.
It seems to work fine (even for letters with diacritics) for dc.title,
dc.contributor and dc.publisher. However, when trying dc.creator, no
results are returned both for queries with and without diacritics. When
dc.author is replaced by eg.author, it works fine. Also when searching
without specifying anything (say find "matousek" or "find matoušek") it
works okay:
*find:*
$ yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru
Connecting...OK.
Z> sru GET 1.1
Z> find matousek
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.708890
Z> find matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.675457
*find dc.creator:*
Z> find dc.creator=matousek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 0
Elapsed: 0.192852
Z> find dc.creator=matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 0
Elapsed: 0.238054
*find eg.author:*
Z> find eg.author=matousek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.663780
Z> find eg.author=matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.861588
*find dc.title:*
Z> find dc.title=buh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 69
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.953206
Z> find dc.title=bůh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 69
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.509182
**
*find dc.subject:*
Z> find dc.subject=buh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 243
SRU server returns extra records. Skipping 10 records.
Elapsed: 2.415498
Z> find dc.subject=bůh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 243
SRU server returns extra records. Skipping 10 records.
Elapsed: 1.595494
*
**find dc.contributor:*
Z> find dc.contributor=dvorak
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 44
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.621795
Z> find dc.contributor=dvořák
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 44
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.843226
*find dc.publisher:*
Z> find dc.publisher=portal
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 40
SRU server returns extra records. Skipping 10 records.
Elapsed: 1.012238
Z> find dc.publisher=portál
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 40
SRU server returns extra records. Skipping 10 records.
I have also found out there is a charset command which gives the
following results:
yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru
Connecting...OK.
Z> sru GET 1.1
Z> charset
Negotiation character set `none'
Display character set is `UTF-8'
MARC character set is `none'
Query character set is `none'
Do you think this could be related to our Z39.50 client query encoding
issues? Should the charset be - somewhere - specifically set to utf-8?
I have also found a bug reported by Jason Stephenson about a year ago
(https://bugs.launchpad.net/evergreen/+bug/1346518) which seems to
describe the same problem but it has probably not been looked into since
then.
Thank you in advance for any clues!
Linda
On 08/25/2015 02:51 PM, Linda Jansova wrote:
> Hi again,
>
> We have installed a 2.8.3 and tested the Z39.50 server output, yet the
> problem with diacritics remains the same. Again, when submitting a
> generic query with diacritics (using yaz-client), we get the expected
> results, but when searching specifically for author, it results in no
> hits.
>
> I have had a look at
> https://coffeecode.net/archives/217-More-granular-identifier-indexes-for-your-Evergreen-SRU-Z39.50-servers.html
> again and it seems to me that maybe the problem is caused by the fact
> that with the exception of keyword, each of the other indexes
> mentioned at Dan's blog (author, title, series, subject) is actually
> composed of more granular indexes.
>
> Maybe I am on the wrong track but it seems to me that it would be
> rather a strange coincidence that keyword index (which I suppose the
> Z39.50 client would use when no fields are specified) works fine
> unlike those other "composed" indexes...
>
> Do you think that this could be the reason why we are experiencing the
> encoding problems? And if so, do you have any idea where to look for
> the appropriate encoding settings and change them to utf-8?
>
> Thank you in advance for any clues to the puzzle!
>
> Linda
>
> On 08/19/2015 03:40 PM, Linda Jansova wrote:
>> Oh, I see! In that case we shall try the upgrade and see what happens
>> (we shall keep you posted :-)...
>>
>> Thank you for your help!
>>
>> Linda
>>
>> On 08/19/2015 03:34 PM, Jason Stephenson wrote:
>>> Quoting Linda Jansova <skolkova at chello.cz>:
>>>
>>>> Thank you, Jason!
>>>>
>>>> I have actually come across this bug as well but it seems that it
>>>> has already been fixed (or at least this is my understanding of
>>>> information from Launchpad) - we are currently using Evergreen
>>>> 2.8.2...
>>>
>>> The fix only went in last night. It will be in today's release of
>>> 2.8.3,
>>> so it might be worth the upgrade to see if it helps.
>>>
>>>>
>>>> And you hit the nail on the head - we also usually search other
>>>> sources and so it took quite some time to discover the problem...
>>>>
>>>> Linda
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20150831/476fc7c2/attachment.html>
More information about the Open-ils-general
mailing list