[OPEN-ILS-DEV] marc2bre mapping errors (was: PATCH & RFC: Providing i18n support in OpenILSdatabase schema (diacritics))

Mon Jun 4 15:35:05 EDT 2007

On 04/06/07, Don Hamilton <dhamilton at wlu.ca> wrote:
>
>
> Appropos the translation problems...
>
> I'm running marc2bre on a gagillion real records today. During the first
> 20,000 or so I get "8no mapping found at position 38 in GROUNDWATER STUDIES
> IN THE ASSINIBOINEÝRIVER DRAINAGE BASIN - PART 1: THE EVALUATION OF A FLOW
> SYSTEM IN SOUTH-CENTRAL SASKATCHEWAN. g0=ASCII_DEFAULT g1=EXTENDED_LATIN at
> /usr/share/perl5/MARC/Charset.pm line 134." twnty or so
> times.
>
>
> I presume, given the umpteen conversions that our marc records have gone
> though (3 original homegrown systems to geac to more home grown systems to
> voyager) that there is crap in my records.
>
> Or, give that the message says ASCII-DEFAULT and EXTENDED LATIN, have I
> missed setting utf-8 (or 16) somehwhere?
>
> Do I care?
>
> don

Hi Don:

The default output format of marc2bre.pl is UTF-8. You don't need to
specify that anywhere.

My first suggestion is to ensure that your MARC::Charset Perl module
is up to date (version 0.96). Sometime within the past month it was
updated to fix a few MARC8 -> UTF8 conversion errors; these might be
biting you if you haven't upgraded recently.

If that doesn't help, and you suspect that your records contain 'crap'
using a different encoding like "CP850", you could try passing the
--encoding parameter to marc2bre.pl. By default, marc2bre.pl assumes
that your records are sourced in MARC8 encoding (because hey - the
only valid encodings for MARC21 are MARC8 and UTF8), but you can
override that with the --encoding parameter (e.g. marc2bre.pl
--encoding CP850 blah.mrc). I haven't played with that setting very
much, however, so your mileage may vary.

-- 
Dan Scott
Laurentian University