[OPEN-ILS-DEV] More unicode patches for SIPServer

Mon Jun 20 15:12:25 EDT 2011

On Fri, Jun 17, 2011 at 4:06 PM, Dan Scott <dan at coffeecode.net> wrote:

> I offer two more patches (being used here in production at Laurentian
> University) for SIPServer:
>
> 1) Takes a bit more care in using decode_utf8() / encode_utf8()
> consistently; the generally recommended approach is to decode input and
> encode output.
>
> 2) Restore the old OpenNCIP checksum algorithm for handling Unicode over
> the wire. This algorithm is necessary for our 3M V-series self-check to
> work with the Unicode encoding enabled on the unit; I rather
> painstakingly worked it out in late 2009 and submitted it to OpenNCIP
> back then.
>

After spending many an hour on both the Koha and EG versions of SIP checksum
code, the problems that exist are quite tricky and afaict, dependent on the
SIPserver OS/perl setup.  Therefore saying a given code works with 3M
hardware is necessary but not sufficient.

The main problems for us are:

   - 3M specifications in the Implementer's Handbook explicitly depend on
   "ASCII values" and assume to know the underlying representation of values in
   binary, including representation depth.
   - we have an insufficient body of actual tests (examples) for known-good
   checksum calculation on strings like those in actual use: long ones, with
   Unicode.

More tests, including long lines with Unicode are required.  I would be most
happy to have some provided or verified by 3M, if possible.  The sad thing
is that it *should* be possible to just have a webpage calculate in
javascript even, but we end up still not knowing if it is right-enough.

> I have also attached "test_checksum.pl" to demonstrate the
> observable difference between the old checksum and the new (I suppose
> the right thing to do would be to roll this into the actual unit tests,
> if there is general agreement that the checksum matches the reality of
> more than just 3M V-series self-checks with Unicode encoding enabled).
>

Right, my approach would be to begin building a proper CPAN module (in a
namespace like Business::3MSIP) that would start out with just the
dependencies and yet-to-be-established checksumming tests.  That way, we get
the benefit of *all* the CPAN-testers different OS and perl configurations,
without having to set them up ourselves or rely on users reporting.  (The
nice thing about testing checksums is that it doesn't require a full running
SIPserver.)

> If we discover that the Unicode checksum-handling differs between
> various self-checks, then we may need to add yet another configuration
> file option (sigh) to enable switching between the appropriate
> algorithms. But hopefully %16C just works :)
>

Certainly some old janky SIP clients do not speak Unicode.  One problem is
that they won't have a way of reporting ASCII-centricness, because they
expect it to be the default.  For EG users, I think these relics may be
negligible since EG has always used Unicode throughout.

I'm also guessing that the timing of encode/decode relative to checksum
calculation is pretty important.  Since the specs were written without
regard for combining vs. composed characters, we are sorta on our own here.
 All this trouble is for a data-integrity feature designed for damn serial
cables and plainly unnecessary over already checksummed TCP.

In short, Dan, I'm not arguing these changes are wrong.  I trust they are
right *on your systems*.  But I'm certain we don't have enough test coverage
to conclude they are right for all systems currently in production, let
alone all systems we intend to support.

--joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-dev/attachments/20110620/53a966f9/attachment.htm>