[OPEN-ILS-GENERAL] ***SPAM*** Special diacritics

James Fournie james.fournie at gmail.com
Tue May 25 15:43:02 EDT 2010


Hello there,

One of our SITKA sites has noticed that some characters such as ø do
not function as expected in search results.  This problem was
encountered with the author Peter Høeg -- a search for "Hoeg"
generally works but it does not return some results that contain the
special character unless they're found without the accent elsewhere in
the MARC record without that character.  In the database, items are in
fact indexed with the ø.

The reason for this is that ø is can not be broken into a base
character with a combining character -- é, è, ê are all based on e
with a combining accent.  ø is actually not an o and there is no
combining character of a diagonal slash like that.  ø is instead its
own character.  There are other similar characters that have this
problem, for example the German ß transliterates as 'ss', and the
character æ is like ae.  All of this is normal standard Unicode
behaviour, but not necessarily desirable from a user perspective.

I'm wondering if any other sites have any experience with this or any
ideas for dealing with this situation.  This is a problem that mostly
affects Germanic languages.

Thanks!

~James Fournie
BC SITKA


More information about the Open-ils-general mailing list