[OPEN-ILS-GENERAL] Import issues

Dibyendra Hyoju dibyendra at gmail.com
Tue Aug 18 07:49:19 EDT 2009


Hello Dan,
Thank you very much once again for your kind response and for the detail
explanations, which is really appreciable. I am sorry that I couldn't get
back to this issue in a prompt way, because I was on leave for a few days
due to some urgent tasks.

After reading your email, I understood why our records were creating
duplicate TCNs while converting into BRE format. More than 50% of our
library records don't have ISBN numbers. Currently, there are around 23000
records, and we have only validated our 12000 MARC records. I have a plan to
migrate all the validated MARC records into Evergreen within this week if
the import process goes without problem.

Like the alternate way that you had described to convert the MARC records
into BRE using "--used_tcn_file" option, I used 'marc2bre.pl' in following
ways:

#perl marc2bre.pl --db_user postgres --db_host localhost --db_pw evergreen
--db_name evergreen --encoding UTF8 --used_tcn_file tcns.txt> 8500.bre

and

 perl marc2bre.pl --db_user postgres --db_host localhost --db_pw evergreen
--db_name evergreen --encoding UTF8 --idfield 852 --idsubfield x
--used_tcn_file tcns.txt> 8500.bre

and

perl marc2bre.pl --db_user postgres --db_host localhost --db_pw evergreen
--db_name evergreen --encoding UTF8 --idfield 852 --idsubfield x --tcnfield
852 --tcnsubfield x 8500.mrc --used_tcn_file tcns.txt> 8500.bre



But, the SQL output generated from all the bre and ingest file from above
commands still cannot be executed and it gives the same error as before. The
file 'tcns.txt' contains just two lines like you have said.

Looking forward to hearing from you.

Thank you once again.

With kind regards,
Dibyendra

On Fri, Aug 14, 2009 at 9:04 AM, Dan Scott <denials at gmail.com> wrote:

> A quick peek suggests that the duplicate TCN values are "NEW" and "i",
> after which they resort to just s + record ID (for example, "s8602").
>
> This seems strange to me, it looks like the --tcnfield and
> --tcnsubfield options are being completely ignored, as "NEW" is found
> in the 001 field of each of your records, and "i" comes from the 020
> field (which doesn't have an "a" subfield, so no number gets
> assigned). This is the behaviour that results when a TCN value isn't
> found; however, all of the records you previously sent most definitely
> have a value in 852$x (and in fact the ID field is being correctly set
> to that numeric value). When I ran the
> marc2bre/direct_ingest/pg_loader process, I saw TCNs with values like
> "Accession number: 8600" as you would expect from your records. Very
> strange, it's like the BRE files come from a previous run of
> marc2bre.pl that didn't have the --tcnfield/--tcnsubfield options.
>
> Ah well. One way around this would be to create a "used TCN file";
> just a text file containing the following lines:
>
> NEW
> i
>
> and then point at it in marc2bre.pl using the --used_tcn_file option;
> for example, "--used_tcn_file tcns.txt". This will force it to use the
> s + record ID approach to deriving a TCN.
>
> Dan
>
> 2009/8/13 Dibyendra Hyoju <dibyendra at gmail.com>:
> > Hi Dan,
> >
> > Thank you for your prompt response.
> >
> > Please find the attachment herewith. The attachment includes outputs of
> > three MARC records containing 250-300 bibliographic information.
> >
> > Thank you.
> >
> > With kind regards,
> > Dibyendra
> >
> > On Thu, Aug 13, 2009 at 11:11 PM, Dan Scott <denials at gmail.com> wrote:
> >>
> >> Hi Dibyendra:
> >>
> >> Can you please attach some zipped output (BRE, ingest, and SQL) files
> >> as well as the errors to help us work out where the duplication is
> >> occurring?
> >>
> >> Dan
> >>
> >> 2009/8/13 Dibyendra Hyoju <dibyendra at gmail.com>:
> >> > Hello Dan,
> >> >
> >> > Thanks for your prompt response. I tried those options with
> marc2bre.pl
> >> > and
> >> > now the 'marc2bre.pl' produced no error. But, the problem about
> >> > duplication
> >> > still persists while executing the SQL. I rebuilt the database several
> >> > times, and tried 'marc2bre.pl' with all the suggested options like you
> >> > have
> >> > suggested.
> >> >
> >> > I performed the following commands during the import process and they
> >> > produced SQL for each MARC records:
> >> >
> >> > #perl marc2bre.pl --db_user postgres --db_host localhost --db_pw
> >> > evergreen
> >> > --db_name evergreen --encoding UTF8 --idfield 852 --idsubfield x
> >> > --tcnfield
> >> > 852 --tcnsubfield x 8500.mrc > 8500.bre
> >> > #perl direct_ingest.pl ~/8500.bre > ~/8500.ingest
> >> > #perl pg_loader.pl -or bre -or mrd -or mfr -or mtfe -or mafe -or msfe
> >> > -or
> >> > mkfe -or msefe -a mrd -a mfr -a mtfe -a mafe -a msfe -a mkfe -a msefe
> >> > --output=8500.sql < ~/8500.ingest
> >> > #cp 8500.sql ~
> >> > #psql -U evergreen evergreen
> >> > evergreen=# \i ~/8500.sql
> >> >
> >> > After rebuilding the database, any generated SQL is executed
> >> > successfully.
> >> > But, after the first successful execution of any generated SQL, any
> >> > other
> >> > generated SQL files are not executed. I've attached the generated SQL
> >> > errors
> >> > herewith.
> >> >
> >> > I look forward to hearing from you.
> >> >
> >> > Thank you.
> >> >
> >> > --
> >> > Dibyendra Hyoju
> >> > Madan Puraskar Pustakalaya
> >> > Patan Dhoka, Lalitpur
> >> > Nepal
> >> >
> >> > On Thu, Aug 13, 2009 at 12:14 PM, Dan Scott <denials at gmail.com>
> wrote:
> >> >>
> >> >> Sorry Dibyendra, those options should have been --tcnfield and
> >> >> --tcnsubfield
> >> >>
> >> >> Try those, they should work in 1.4.
> >> >>
> >> >> Dan
> >> >>
> >> >> 2009/8/13 Dibyendra Hyoju <dibyendra at gmail.com>:
> >> >> > Hello Dan,
> >> >> >
> >> >> > Thank you very much for your response.
> >> >> >
> >> >> > I used the options: 'tcn_field' and 'tcn_subfield', but they are
> not
> >> >> > recognized by the marc2bre.pl. I executed the marc2bre.pl in
> >> >> > following
> >> >> >  way and the output is as follows:
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> dibyendra-laptop:/home/opensrf/Evergreen-ILS-1.4.0.4/Open-ILS/src/extras/import#
> >> >> > perl marc2bre.pl --db_user postgres --db_host localhost --db_pw
> >> >> > evergreen --db_name evergreen --encoding UTF8 --idfield 852
> >> >> > --idsubfield x --tcn_field 852 --tcn_subfield x 8500.mrc > 8500.bre
> >> >> > Unknown option: tcn_field
> >> >> > Unknown option: tcn_subfield
> >> >> >
> >> >> > I guess, these options: 'tcn_field' and 'tcn_subfield' are not
> >> >> > implemented on EG 1.4 yet. Is that so? I couldn't find EG 1.6 on
> the
> >> >> > download page. We're planning to migrate our 10,000 validated MARC
> >> >> > records into Evergreen.
> >> >> >
> >> >> > Thank you.
> >> >> >
> >> >> > --
> >> >> > Dibyendra Hyoju
> >> >> > Madan Puraskar Pustakalaya
> >> >> > Patan Dhoka, Lalipur
> >> >> > Nepal
> >> >> >
> >> >> > On Tue, Aug 11, 2009 at 10:15 AM, Dan Scott <denials at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> 2009/8/8 Dibyendra Hyoju <dibyendra at gmail.com>:
> >> >> >> > Hello all,
> >> >> >> > I imported the sample records again on the another machine
> having
> >> >> >> > Evergreen
> >> >> >> > 1.4, and after importing that record, I couldn't import any
> >> >> >> > records
> >> >> >> > from
> >> >> >> > other MARC files. I have attached the record '8500.mrc' and
> >> >> >> > '8400.mrc'
> >> >> >> > herewith. I first executed the SQL generated from 8500.mrc'
> >> >> >> > successfully.
> >> >> >> > Then, I couldn't execute the SQL generated from '8400.mrc'. The
> >> >> >> > error
> >> >> >> > is
> >> >> >> > same as before like 'ERROR:  duplicate key violates unique
> >> >> >> > constraint
> >> >> >> > "biblio_record_unique_tcn"'. I only used the option "--encoding
> >> >> >> > UTF8"
> >> >> >> > this
> >> >> >> > time. I tried few other records, but I got the same error. Few
> of
> >> >> >> > the
> >> >> >> > tested
> >> >> >> > records are attached herewith if somebody wants to volunteer the
> >> >> >> > test. If
> >> >> >> > anyone has faced this problem before and have found the
> solution,
> >> >> >> > please
> >> >> >> > help. Any help will be highly appreciated.
> >> >> >>
> >> >> >> Sorry for the delayed reply, I'm on leave at the moment and not
> >> >> >> connected very often.
> >> >> >>
> >> >> >> marc2bre.pl doesn't really deal with automatically generated TCNs
> >> >> >> all
> >> >> >> that well, so you're best off explicitly identifying a source for
> >> >> >> the
> >> >> >> TCN. To avoid getting duplicate TCN values and record ID values if
> >> >> >> all
> >> >> >> of your records follow the same pattern with the 852 $x field /
> >> >> >> subfield accession number identifier, you should also use the
> >> >> >> --tcn_field / --tcn_subfield options for marc2bre.pl:
> >> >> >>
> >> >> >> perl marc2bre.pl --db_user postgres --db_host localhost --db_pw
> >> >> >> evergreen --db_name evergreen --encoding UTF8 --idfield 852
> >> >> >> --idsubfield x --tcn_field 852 --tcn_subfield x 8400.mrc >
> 8400.bre
> >> >> >>
> >> >> >> I tried importing both your 8000.mrc and 8400.mrc files with the
> and
> >> >> >> these options worked fine for me with no duplicate value warnings.
> >> >> >> This is with Evergreen rel_1_6_0 on Ubuntu 8.04, but marc2bre.pl
> >> >> >> (the
> >> >> >> most critical script for parsing out TCN and record number) shows
> no
> >> >> >> significant differences between 1.4 and rel_1_6_0.
> >> >> >>
> >> >> >> --
> >> >> >> Dan Scott
> >> >> >> Laurentian University
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Dan Scott
> >> >> Laurentian University
> >> >
> >>
> >>
> >>
> >> --
> >> Dan Scott
> >> Laurentian University
> >
>
>
>
> --
> Dan Scott
> Laurentian University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20090818/26c64e1c/attachment-0001.htm 


More information about the Open-ils-general mailing list