[OPEN-ILS-GENERAL] Synonym Dictionary - Numbers, &

Josh Stompro stomproj at exchange.larl.org
Thu May 25 11:05:13 EDT 2017


Hello, I've followed the steps in the following wiki pages to enable a synonym dictionary but I'm not getting the results I expect.

https://wiki.evergreen-ils.org/doku.php?id=scratchpad:brush_up_search#synonym_dictionary

Spelled out numbers do get translated to digits (six -> 6) but digits don't get translated ( 6 -> six).

When I test the synonym dictionary with something like the following it looks like it works:
select ts_lexize('synonym_larl', '6');
ts_lexize
-----------
{six}
(1 row)

But when I look at the the metabib.title_field_entry for a record that has been reindexed I see the following.
select * from metabib.title_field_entry where source=102449 limit 100;
   id    | source | field |                          value                           |                                                                                            index_vector
---------+--------+-------+----------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2402931 | 102449 |     6 | Little house on the prairie Season 6 [disc 2] test seven | '2':9A,13C,20C '6':7A,12C,18C '7':14C 'disc':8A,19C 'hous':13C 'house':2A 'littl':12C 'little':1A 'on':3A,14C 'prairi':16C 'prairie':5A 'season':6A,17C 'seven':11A,22C 'test':10A,21C 'the':4A,15C

Seven gets added as 'seven' and '7', but the '2' and '6' do not.

So I'm wondering if the search configuration needs to cover numeric tokens to make that work?

select * from ts_debug('synonym_larl', '6');
alias |   description    | token | dictionaries | dictionary | lexemes
-------+------------------+-------+--------------+------------+---------
uint  | Unsigned integer | 6     | {simple}     | simple     | {6}

\dF+ synonym_larl;
Text search configuration "public.synonym_larl"
Parser: "pg_catalog.default"
      Token      | Dictionaries
-----------------+--------------
asciihword      | synonym_larl
asciiword       | synonym_larl
email           | simple
file            | simple
float           | simple
host            | simple
hword           | simple
hword_asciipart | synonym_larl
hword_numpart   | simple
hword_part      | simple
int             | simple
numhword        | simple
numword         | simple
sfloat          | simple
uint            | simple
url             | simple
url_path        | simple
version         | simple
word            | simple

Maybe the uint token needs to be set to synonym_larl also? But I'm wondering if this has bad side effects?

Also, another mapping we would like to make is '&' -> 'and' , 'and' -> '&'.  But it doesn't look like tsearch knows how to categorize '&' as a token.

select * from ts_debug('synonym_larl', '&');
alias |  description  | token | dictionaries | dictionary | lexemes
-------+---------------+-------+--------------+------------+---------
blank | Space symbols | &     | {}           |            |

Works fine going the other way and the '&' ends up in the index.

select * from ts_debug('synonym_larl', 'and');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | and   | {synonym_larl} | synonym_larl | {&}

Thanks
Josh


Lake Agassiz Regional Library - Moorhead MN larl.org
Josh Stompro     | Office 218.233.3757 EXT-139
LARL IT Director | Cell 218.790.2110

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20170525/b6d4fcbe/attachment.html>


More information about the Open-ils-general mailing list