[OPEN-ILS-GENERAL] Series index, only first entry getting indexed

Boyer, Jason A JBoyer at library.IN.gov
Wed Mar 1 11:48:13 EST 2017


That sounds like it should work, there are already other examples where essentially the same field/xpath is used for each separately. Only applying the normalizer to the facet field should keep the OPAC looking good while also keeping the $v in the search indexes.

--
Jason Boyer
MIS Supervisor
Indiana State Library
http://library.in.gov/

From: Open-ils-general [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of Josh Stompro
Sent: Wednesday, March 01, 2017 11:33 AM
To: Evergreen Discussion Group <open-ils-general at list.georgialibraries.org>
Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed

**** This is an EXTERNAL email. Exercise caution. DO NOT open attachments or click links from unknown senders or unexpected email. ****
________________________________
Jason, I would really like to leave the series index info in the search index.  It would be nice if staff/customers could do a series search like "Harry Potter 1" to get all the titles for the first harry potter book.

It seems like the issue is that one config.metabib_field entry for Series Title is set to both search_field and facet_field.  If I turn off the facet_field flag for that entry and create a new entry for a series title facet, and then just apply the normalizer to that field,  I wonder if that would do it?  So the facet entries would get cleaned up, but the index entries would be left alone.

Josh Stompro - LARL IT Director

From: Open-ils-general [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of Boyer, Jason A
Sent: Wednesday, March 01, 2017 10:22 AM
To: Evergreen Discussion Group
Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed

Thanks for figuring this out, Josh. I was able to modify our normalizer like so to continue removing the $v:
BEGIN;
UPDATE config. index_normalizer SET param_count =3 WHERE id IN (SELECT id FROM config. index_normalizer WHERE func = 'regexp_replace');
UPDATE config.metabib_field_index_norm_map SET params='["; *[0-9]*","","g"]' WHERE field = 1 and norm in (SELECT id FROM config. index_normalizer WHERE func = 'regexp_replace');
COMMIT;

If you have more than 1 normalizer that uses regexp_replace or are using it on more than one field you won't want to use this as-is, but if you only have the 1 and are currently only using it on your series titles it's good to go.

Jason

--
Jason Boyer
MIS Supervisor
Indiana State Library
http://library.in.gov/

From: Open-ils-general [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of Josh Stompro
Sent: Wednesday, March 01, 2017 10:41 AM
To: Evergreen Discussion Group <open-ils-general at list.georgialibraries.org<mailto:open-ils-general at list.georgialibraries.org>>
Subject: Re: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed

**** This is an EXTERNAL email. Exercise caution. DO NOT open attachments or click links from unknown senders or unexpected email. ****
________________________________
Removing the regex replace normalizer did take care of it, sorry I didn't try that before posting.  I think my regex will have to be more selective, only getting rid of the number and the ';' so it doesn't clear out too much data.

Josh Stompro - LARL IT Director

From: Open-ils-general [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf Of Josh Stompro
Sent: Wednesday, March 01, 2017 9:19 AM
To: open-ils-general at list.georgialibraries.org<mailto:open-ils-general at list.georgialibraries.org>
Subject: [OPEN-ILS-GENERAL] Series index, only first entry getting indexed

Hello, we have noticed that only the first 490 get indexed for our series search index.  But all 490's get added to the series facet entry.

For example, here is a title with two 490's in mods32 format.
https://egcatalog.larl.org/opac/extras/unapi?id=tag::U2@bre/237592&format=mods32

The second 490 of "Felicity classic" isn't searchable.

When I look at the metabib.combined_series_field_entry I see the following for this record.
record

metabib_field

index_vector

237592

'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl'

237592

1

'american' 'beforev' 'beforever' 'felic' 'felicity' 'girl'


metabib.series_field_entry
id

source

field

Value

index_vector

430451

237592

1

American Girl Beforever Felicity

'american':1A,5C 'beforev':7C 'beforever':3A 'felic':8C 'felicity':4A 'girl':2A,6C


Metabib.facet_entry
value

count

bibid

American Girl Beforever Felicity

1

237592

Felicity classic

1

237592



The one thing that I have done is to add a search normalizer to get rid of the series numbering from the facet entry.  Unfortunately I don't remember if this issue came up before I added the normalizer.  Maybe when used on the index version the regex replace is actually acting on all the 490 info concatenated together, so by getting rid of everything after the first ' ;' I'm clearing the second 490 entry data?  But it does work correctly on the facet data?

There is a note on  https://wiki.evergreen-ils.org/doku.php?id=documentation:indexing#field_normalization_settings
"Note: Only normalizations with a negative pos value are applied to the facet version of indexed terms!"  But that must not mean that the normalizer only acts on the facet when there is a negative pos value?

This is going to be wide, but here is our normalizer setup and our series metabib field info.

id

field

norm

params

pos

id

field_class

name

label

xpath

weight

format

search_field

facet_field

browse_field

browse_xpath

browse_sort_xpath

facet_xpath

authority_xpath

joiner

restrict

id

name

description

func

param_count

51

32

2

0

32

series

browse

Series Title (Browse)

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"]

1

mods32

false

false

true

*[local-name() != "nonSort"]

//@xlink:href

false

2

Normalize date range

Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index.

split_date_range

0

1

1

2

0

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

2

Normalize date range

Split date ranges in the form of "XXXX-YYYY" into "XXXX YYYY" for proper index.

split_date_range

0

62

1

13

["[",""]

-1

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

13

Replace

Replace all occurences of first parameter in the string with the second parameter.

replace

2

61

1

13

["]",""]

-1

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

13

Replace

Replace all occurences of first parameter in the string with the second parameter.

replace

2

52

32

17

0

32

series

browse

Series Title (Browse)

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[@type="nfi"]

1

mods32

false

false

true

*[local-name() != "nonSort"]

//@xlink:href

false

17

Search Normalize

Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.

search_normalize

0

2

1

17

0

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

17

Search Normalize

Apply search normalization rules to the extracted text. A less extreme version of NACO normalization.

search_normalize

0

64

1

18

[" *;.*",""]

-1

1

series

seriestitle

Series Title

//mods32:mods/mods32:relatedItem[@type="series"]/mods32:titleInfo[not(@type="nfi")]

1

mods32

true

true

false

//@xlink:href

false

18

Replace by regular expression

regexp_replace

2


Thanks for any ideas you might have.
Josh

Lake Agassiz Regional Library - Moorhead MN larl.org
Josh Stompro     | Office 218.233.3757 EXT-139
LARL IT Director | Cell 218.790.2110

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20170301/0ecdadbc/attachment-0001.html>


More information about the Open-ils-general mailing list