[OPEN-ILS-GENERAL] Synonym Dictionary - Numbers, &

Mike Rylander mrylander at gmail.com
Thu May 25 12:18:53 EDT 2017


Josh,

To cover numbers, it looks like you just need to add dictionaries (I
probably wouldn't use just one for everything) for uint, etc.  Note,
you can stack dictionaries.

As for & (along with |, !, and maybe parens), it may be best to simply
map those to some well-known token in search_normalize() that's very
unlikely to be used in the real world.  Perhaps some unicode
codepoint, like ☃ and friends.  Those are special characters used by
tsearch itself.

HTH,
--
Mike Rylander
 | President
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at equinoxinitiative.org
 | web:  http://equinoxinitiative.org


On Thu, May 25, 2017 at 11:05 AM, Josh Stompro
<stomproj at exchange.larl.org> wrote:
> Hello, I’ve followed the steps in the following wiki pages to enable a
> synonym dictionary but I’m not getting the results I expect.
>
>
>
> https://wiki.evergreen-ils.org/doku.php?id=scratchpad:brush_up_search#synonym_dictionary
>
>
>
> Spelled out numbers do get translated to digits (six -> 6) but digits don’t
> get translated ( 6 -> six).
>
>
>
> When I test the synonym dictionary with something like the following it
> looks like it works:
>
> select ts_lexize('synonym_larl', '6');
>
> ts_lexize
>
> -----------
>
> {six}
>
> (1 row)
>
>
>
> But when I look at the the metabib.title_field_entry for a record that has
> been reindexed I see the following.
>
> select * from metabib.title_field_entry where source=102449 limit 100;
>
>    id    | source | field |                          value
> |
> index_vector
>
> ---------+--------+-------+----------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 2402931 | 102449 |     6 | Little house on the prairie Season 6 [disc 2]
> test seven | '2':9A,13C,20C '6':7A,12C,18C '7':14C 'disc':8A,19C 'hous':13C
> 'house':2A 'littl':12C 'little':1A 'on':3A,14C 'prairi':16C 'prairie':5A
> 'season':6A,17C 'seven':11A,22C 'test':10A,21C 'the':4A,15C
>
>
>
> Seven gets added as ‘seven’ and ‘7’, but the ‘2’ and ‘6’ do not.
>
>
>
> So I’m wondering if the search configuration needs to cover numeric tokens
> to make that work?
>
>
>
> select * from ts_debug('synonym_larl', '6');
>
> alias |   description    | token | dictionaries | dictionary | lexemes
>
> -------+------------------+-------+--------------+------------+---------
>
> uint  | Unsigned integer | 6     | {simple}     | simple     | {6}
>
>
>
> \dF+ synonym_larl;
>
> Text search configuration "public.synonym_larl"
>
> Parser: "pg_catalog.default"
>
>       Token      | Dictionaries
>
> -----------------+--------------
>
> asciihword      | synonym_larl
>
> asciiword       | synonym_larl
>
> email           | simple
>
> file            | simple
>
> float           | simple
>
> host            | simple
>
> hword           | simple
>
> hword_asciipart | synonym_larl
>
> hword_numpart   | simple
>
> hword_part      | simple
>
> int             | simple
>
> numhword        | simple
>
> numword         | simple
>
> sfloat          | simple
>
> uint            | simple
>
> url             | simple
>
> url_path        | simple
>
> version         | simple
>
> word            | simple
>
>
>
> Maybe the uint token needs to be set to synonym_larl also? But I’m wondering
> if this has bad side effects?
>
>
>
> Also, another mapping we would like to make is ‘&’ -> ‘and’ , ‘and’ -> ‘&’.
> But it doesn’t look like tsearch knows how to categorize ‘&’ as a token.
>
>
>
> select * from ts_debug('synonym_larl', '&');
>
> alias |  description  | token | dictionaries | dictionary | lexemes
>
> -------+---------------+-------+--------------+------------+---------
>
> blank | Space symbols | &     | {}           |            |
>
>
>
> Works fine going the other way and the ‘&’ ends up in the index.
>
>
>
> select * from ts_debug('synonym_larl', 'and');
>
>    alias   |   description   | token |  dictionaries  |  dictionary  |
> lexemes
>
> -----------+-----------------+-------+----------------+--------------+---------
>
> asciiword | Word, all ASCII | and   | {synonym_larl} | synonym_larl | {&}
>
>
>
> Thanks
>
> Josh
>
>
>
>
>
> Lake Agassiz Regional Library - Moorhead MN larl.org
>
> Josh Stompro     | Office 218.233.3757 EXT-139
>
> LARL IT Director | Cell 218.790.2110
>
>


More information about the Open-ils-general mailing list