[OPEN-ILS-GENERAL] Series index, only first entry getting indexed
Boyer, Jason A
JBoyer at library.IN.gov
Wed Mar 1 11:21:33 EST 2017
Thanks for figuring this out, Josh. I was able to modify our normalizer like so to continue removing the $v:
BEGIN;
UPDATE config. index_normalizer SET param_count =3 WHERE id IN (SELECT id FROM config. index_normalizer WHERE func = 'regexp_replace');
UPDATE config.metabib_field_index_norm_map SET params='["; *[0-9]*","","g"]' WHERE field = 1 and norm in (SELECT id FROM config. index_normalizer WHERE func = 'regexp_replace');
COMMIT;
If you have more than 1 normalizer that uses regexp_replace or are using it on more than one field you won't want to use this as-is, but if you only have the 1 and are currently only using it on your series titles it's good to go.
Jason
--
Jason Boyer
MIS Supervisor
Indiana State Library
http://library.in.gov/
From: Open-ils-general [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of Josh Stompro
Sent: Wednesday, March 01, 2017 10:41 AM
To: Evergreen Discussion Group <open-ils-general at list.georgialibraries.org>
Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed
**** This is an EXTERNAL email. Exercise caution. DO NOT open attachments or click links from unknown senders or unexpected email. ****
________________________________
Removing the regex replace normalizer did take care of it, sorry I didn't try that before posting. I think my regex will have to be more selective, only getting rid of the number and the ';' so it doesn't clear out too much data.
Josh Stompro - LARL IT Director
From: Open-ils-general [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of Josh Stompro
Sent: Wednesday, March 01, 2017 9:19 AM
To: open-ils-general at list.georgialibraries.org<mailto:open-ils-general at list.georgialibraries.org>
Subject: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed
Hello, we have noticed that only the first 490 get indexed for our series search index. But all 490's get added to the series facet entry.
For example, here is a title with two 490's in mods32 format.
https://egcatalog.larl.org/opac/extras/unapi?id=tag::U2@bre/237592&format=mods32
The second 490 of "Felicity classic" isn't searchable.
When I look at the metabib.combined_series_field_entry I see the following for this record.
record
metabib_field
index_vector
237592
'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl'
237592
1
'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl'
metabib.series_field_entry
id
source
field
Value
index_vector
430451
237592
1
American Girl Beforever Felicity
'american':1A,5C 'beforev':7C 'beforever':3A 'felic':8C 'felicity':4A 'girl':2A,6C
Metabib.facet_entry
value
count
bibid
American Girl Beforever Felicity
1
237592
Felicity classic
1
237592
The one thing that I have done is to add a search normalizer to get rid of the series numbering from the facet entry. Unfortunately I don't remember if this issue came up before I added the normalizer. Maybe when used on the index version the regex replace is actually acting on all the 490 info concatenated together, so by getting rid of everything after the first ' ;' I'm clearing the second 490 entry data? But it does work correctly on the facet data?
There is a note on https://wiki.evergreen-ils.org/doku.php?id=documentation:indexing#field_normalization_settings
"Note: Only normalizations with a negative pos value are applied to the facet version of indexed terms!" But that must not mean that the normalizer only acts on the facet when there is a negative pos value?
This is going to be wide, but here is our normalizer setup and our series metabib field info.
id
field
norm
params
pos
id
field_class
name
label
xpath
weight
format
search_field
facet_field
browse_field
browse_xpath
browse_sort_xpath
facet_xpath
authority_xpath
joiner
restrict
id
name
description
func
param_count
51
32
2
0
32
series
browse
Series Title (Browse)
//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"]
1
mods32
false
false
true
*[local-name() != "nonSort"]
//@xlink:href
false
2
Normalize date range
Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index.
split_date_range
0
1
1
2
0
1
series
seriestitle
Series Title
//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]
1
mods32
true
true
false
//@xlink:href
false
2
Normalize date range
Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index.
split_date_range
0
62
1
13
["[",""]
-1
1
series
seriestitle
Series Title
//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]
1
mods32
true
true
false
//@xlink:href
false
13
Replace
Replace all occurences of first parameter in the string with the second parameter.
replace
2
61
1
13
["]",""]
-1
1
series
seriestitle
Series Title
//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]
1
mods32
true
true
false
//@xlink:href
false
13
Replace
Replace all occurences of first parameter in the string with the second parameter.
replace
2
52
32
17
0
32
series
browse
Series Title (Browse)
//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"]
1
mods32
false
false
true
*[local-name() != "nonSort"]
//@xlink:href
false
17
Search Normalize
Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.
search_normalize
0
2
1
17
0
1
series
seriestitle
Series Title
//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]
1
mods32
true
true
false
//@xlink:href
false
17
Search Normalize
Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.
search_normalize
0
64
1
18
[" *;.*",""]
-1
1
series
seriestitle
Series Title
//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]
1
mods32
true
true
false
//@xlink:href
false
18
Replace by regular expression
regexp_replace
2
Thanks for any ideas you might have.
Josh
Lake Agassiz Regional Library - Moorhead MN larl.org
Josh Stompro | Office 218.233.3757 EXT-139
LARL IT Director | Cell 218.790.2110
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20170301/dcb7370e/attachment-0001.html>
More information about the Open-ils-general
mailing list