[OPEN-ILS-DEV] Working around an error

Dan Scott denials at gmail.com
Tue Sep 23 15:58:35 EDT 2008


Hi Mark:

2008/9/22 Andrews, Mark J. <MarkAndrews at creighton.edu>:
<snip>
> I am experimenting with one of the Evergreen VMWare images (v1.2.1.4).  It

Augh! That's... ancient! But anyway...

> seems to run well so far, but I'm stuck at the step of actually loading data
> into Evergreen.  I am running the SQL script which uses the COPY command,
> copying a flat file from disk into the corresponding PostgreSQK table
> Evergreen uses.  I am getting  a duplicate primary key error:
>
>
>
> evergreen-admin at eg-server:~/clic$ sudo psql -U evergreen -f
> pg_loader-output.sql evergreen
> [sudo] password for evergreen-admin:
> Password for user evergreen:
> SET
> BEGIN
> psql:pg_loader-output.sql:3: ERROR:  duplicate key violates unique
> constraint "record_entry_pkey"
> CONTEXT:  COPY record_entry, line 23740: "t     now     1       f
> now     1       hugogrotiuslee  23743   IMPORT-1221663443.06135 <record
> xmlns:xsi="http://www.w3.org..."
> psql:pg_loader-output.sql:4: ERROR:  current transaction is aborted,
> commands ignored until end of transaction block
> psql:pg_loader-output.sql:5: ERROR:  current transaction is aborted,
> commands ignored until end of transaction block
> psql:pg_loader-output.sql:6: ERROR:  current transaction is aborted,
> commands ignored until end of transaction block
> psql:pg_loader-output.sql:7: ERROR:  current transaction is aborted,
> commands ignored until end of transaction block
> psql:pg_loader-output.sql:8: ERROR:  current transaction is aborted,
> commands ignored until end of transaction block
> psql:pg_loader-output.sql:9: ERROR:  current transaction is aborted,
> commands ignored until end of transaction block
> psql:pg_loader-output.sql:10: ERROR:  current transaction is aborted,
> commands ignored until end of transaction block
> ROLLBACK
> evergreen-admin at eg-server:~/clic$
>
>
>
> Now I am not a grep god.  I am pretty handy with flat files where each line
> is a record, and each record is terminated by a CR-LF pair.  But these
> JSONed records are a different kind of beast.  So, if I wanted to rip this
> thing out of the input file (a potentially useful skill), I'd have to move
> back to the beginning-of-record-mark, and forward to the end-of-record-mark,
> in the midst of a 2.8 GB input file.  If it were just a small text file, I'd
> back it up, and use a text editor to modify the backup file, and just delete
> the offending record.  If course doing this will probably screw up some
> other relation, but at the moment I'm just trying to push data in this
> thing.

Well, actually, these are all just one record per linefeed. What about
something like:

# Count the total number of lines in pg_loader-output.sql
wc -l pg_loader-output.sql
# Dump the first 23739 lines into alternate.sql
head -n 23739 pg_loader-output.sql > alternate.sql
# Substitute TAILNUM with total_num_of_lines - 23740 and
# concatenate them onto alternate.sql
tail -n TAILNUM pg_loader-output.sql >> alternate.sql

That would generate a file "alternate.sql" that's missing the
duplicate record and get you over this one particular hump. But my
concern is that if you're getting one duplicate record, it's likely
that there will be others. Are you certain that the id_field /
id_subfield values you passed to marc2bre.pl identifies unique values
for your entire set of records? Unicorn, for example, has this
charming (and by charming, I mean incredibly annoying) habit of
allowing duplicate values in its 001 field.

-- 
Dan Scott
Laurentian University


More information about the Open-ils-dev mailing list