[OPEN-ILS-DEV] ***SPAM*** Re: ***SPAM*** Enabling UTF8 data in SIP2 (patch and RFC)

David Fiander david at fiander.info
Mon Jan 4 20:15:56 EST 2010


> [page 14 of the v2.12 (April 11, 2006) document]
> This is part of the bizarre underpinnings of SIP, relying on an obsolete
> Microsoft codepage.  It offers coverage for common Western European
> languages, but in my experience it breaks on, say, Lithuanian or Arabic.
>  The spec goes on to say "If another character set is required, the SC and
> the ACS must mutually define the character set," but it doesn't say how to
> establish that.  So Evergreen isn't dumbing things down on it's own account.
>  It actually is following the (dumb) spec.

Joe, in general, when the spec says things like that, I generally
assume that the server and the terminal "mutually define" the
character set by having the humans tell them both what set to use. so
you suggestion below that it be a configuration option makes sense.
However, given that Evergreen is natively UTF8 from top to bottom, it
seems that the appropriate thing to do is to just code the SIP module
to be UTF8 aware and tell the terminals to use UTF8, or flat ASCII for
those places that can get away with it.

> As a result, anything we do that is not ASCII-only needs to be the
> configurable exception, in order to avoid breakage with any *other* poor
> bastards dutifully implementing the spec.  As your intuition suggested, I
> would recommend doing any character conversion is exactly one place, and not
> out in the leaf objects like Item.pm.
> A subsequent page in the spec also says "Only displayable characters (no
> control characters) should be included in print or display messages from the
> ACS".  The question of what is displayable obviously depends on the
> character set, so that seems to rule out things like the zero-width
> non-joiner used in Arabic and Hebrew.  So what character set would you use
> there?  UTF-8 minus some random pieces?  Obviously this is a point of
> failure in the design.

Given that 3M's modern equipment is transmitting UTF8 just fine, it
would seem that they are "ignoring" the spec as published. When I have
asked questions of 3M about things in the past, they have responded
promptly with clarifications. Perhaps they just need to be prodded to
revise their document to update it in this area.

Or we just ignore the whole thing and implement NCIP and tell every
little library that they need to update their self-check machines and
print servers if they want to support non-English languages.

- David


More information about the Open-ils-dev mailing list