[OPEN-ILS-DEV] PATCH & RFC: Providing i18n support inOpenILSdatabase schema (diacritics)

Wilkening, Chris Chris.Wilkening at brodart.com
Mon Jun 4 15:00:35 EDT 2007


I suspect that funny Y is creating your problem. That's the kind of thing that trips us up all the time - might want to force UTF-16 and see if that helps.

 

Christopher Wilkening

Web/Application Developer

Brodart Co.

570-326-2461 x6496

________________________________

From: open-ils-dev-bounces at list.georgialibraries.org [mailto:open-ils-dev-bounces at list.georgialibraries.org] On Behalf Of Don Hamilton
Sent: Monday, June 04, 2007 2:54 PM
To: open-ils-dev at list.georgialibraries.org
Subject: RE: [OPEN-ILS-DEV] PATCH & RFC: Providing i18n support inOpenILSdatabase schema (diacritics)

 

Appropos the translation problems...

 

I'm running marc2bre on a gagillion real records today. During the first 20,000 or so I get "8no mapping found at position 38 in GROUNDWATER STUDIES IN THE ASSINIBOINEÝRIVER DRAINAGE BASIN - PART 1: THE EVALUATION OF A FLOW SYSTEM IN SOUTH-CENTRAL SASKATCHEWAN. g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/share/perl5/MARC/Charset.pm line 134." twnty or so times. 

 

 

I presume, given the umpteen conversions that our marc records have gone though (3 original homegrown systems to geac to more home grown systems to voyager) that there is crap in my records.

 

Or, give that the message says ASCII-DEFAULT and EXTENDED LATIN, have I missed setting utf-8 (or 16) somehwhere?

 

Do I care?

 

don

>>> Chris.Wilkening at brodart.com 6/4/2007 2:45 PM >>>

Does the UTF-8 encoding support diacritics? We've ran into problems with
that and generally go with UTF-16 which has, so far, allowed us to
maintain diacritics in database records. We generally run into that
problem with Spanish records but, if my high-school-French-classes
memory isn't faulty, French (as well as most of the Romantic languages)
have diacritics to one degree or another 

Christopher Wilkening

Web/Application Developer

Brodart Co.

570-326-2461 x6496


-----Original Message-----
From: open-ils-dev-bounces at list.georgialibraries.org
[mailto:open-ils-dev-bounces at list.georgialibraries.org] <mailto:open-ils-dev-bounces at list.georgialibraries.org%5d>  On Behalf Of Dan
Scott
Sent: Monday, June 04, 2007 2:38 PM
To: open-ils-dev at list.georgialibraries.org
Subject: [OPEN-ILS-DEV] PATCH & RFC: Providing i18n support in
OpenILSdatabase schema

Hello:

This is just a small patch demonstrating the direction I'm thinking
the OpenILS database schema needs to take to provide I18N support for
various tables that currently contain hard-coded English text. I would
like to get some feedback on this direction before completing the
patch, as it will end up being a rather largish set of changes.

Some anticipated questions (with answers):
Q: Why do we need to change the database schema? Can't sites that need
to support a different language simply change the English text in
OpenILS/src/sql/Pg/*.sql to whatever language they need?

A: That approach works fine for a site that only needs to support one
language - but for bilingual (or multilingual) sites, where you have
to support users who prefer different languages, that approach won't
fly. For example, our university is a bilingual French & English
university - so we need to be able to store both the French and
English versions of a given piece of text in the database.

Q: What format should the locale names use?

A: Quick answer: ll-LL (two-char lowercase language code, hyphen,
two-char uppercase region code)

There are a number of possible formats: en_US, en-us, en_US.UTF-8,
etc. However, given that the Open-ILS database is created using the
UTF-8 encoding, I think we can assume that any language will use the
UTF-8 encoding and therefore avoid having to include the encoding as
part of the locale name. Second, the ISO639-1 standard for
two-character language codes and ISO3166-1 standard for two-character
region codes are pretty well established for use in defining browser
locales and in, er, Java (see the java.util.Locale class for some
details).

A free table listing ISO 639-1 and 639-2 codes is here:
http://www.loc.gov/standards/iso639-2/php/code_list.php

The ISO 3166 region codes are listed here:
http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/
list-en1.html

Given that the canonical format of locales for Web applications
follows the 'll-LL' format
(http://www.w3.org/TR/i18n-html-tech-lang/), and given that much of
Evergreen is exposed through Web services or technologies, I suggest
that the w3c guidelines for locale names be adopted for use in the
database schema and have therefore specified 'en-US' as the locale for
the default strings in the current patch.

Q: Okay, so you're replacing the current table with a set of tables
and a view to maintain  compatibility with the current OpenSRF /
Open-ILS services until they can be taught to be locale-aware. What's
the plan to make these locale-aware?

A: <hand-waving about teaching the settings server to attach client
locale to requests with a default that can hopefully be set
system-wide>

Q: Okay, so you've converted one table over. Big deal. How many more
tables are there to convert?

A: Not many, actually - but there are a few that have more significant
content. At this point there might be some breakage to the existing
config CGIs (I imagine that they would be trying to insert into the
view rather than the new underlying tables) but I wouldn't expect the
restt of the existing services to break _if_ the tables are replaced
with views that default to the en-US locale.

I just want to get general agreement that this patch is headed in the
right direction before heading much further with the other tables. And
yeah, I'm aware that Mike is on vacation now so I couldn't possibly
have worse timing :)

-- 
Dan Scott
Laurentian University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20070604/0c8a460f/attachment.html


More information about the Open-ils-dev mailing list