[OPEN-ILS-DEV] Problem Importing MARC records

Mike Rylander mrylander at gmail.com
Thu Aug 2 15:36:44 EDT 2007


On 8/2/07, Don Hamilton <dhamilton at wlu.ca> wrote:
> You're welcome, Mike...  geeze, you are a fast adaptor....
>
> While I have your attention though, might I ask about alternate
> approaches to pg-loader.pl. Because it keeps EVERTHING in memory til its
> done reading the complete input file, I get into severe swapping after
> 30000 records or so... (on a system with 1gb memory). I've seen others
> mention similar problems. Yes I could break my input into chunks,
> but...
>
> In the past, I've used a technique where I wrote separate files for
> each table of data, piped those to a 'sort unique', and then done a bulk
> copy to load the individual tables. Is that something that you (or
> others) would find useful? I'd still like to get to a place where I

We won't turn down an alternate script.  I think the way I'd like to
see it (and can do it if your not feeling saucy) is to specify an
output directory as a command line param, instead of a file, and write
a file per table plus one master script to pull the table scripts in,
ordered correctly, within one transaction.

Another option would be to treat those table scripts as temp files and
just cat them together at the end of the process, which I think is
approximately what you were suggesting.

I personally favor the script-per-table approach because it makes
editing one tables worth of data managable -- 7 or 8 100M files
instead of a single 1G file.

> could load a fresh evergreen d/b every week or so from a dump of my 5
> million bibliographic records. I do do that now (in 3 or 4 hours,
> elapsed) with my Simple OPAC Backup at http://library.wlu.ca/searchme,
> and don't see why I can't accomplish the same feat with evergreen.
>

I have reservations about promoting EG as an OPAC alternative.  It's
far from optimized for that, and there is a ton of overhead that just
isn't needed in order to do simple searches but cannot be turned off.
It's just not meant for that purpose and there are many good projects
out there that fill that niche ...

That being said, what you want is possible, but it will require a good
bit of extra scripting (inside and outside the database) to automate.
(Using GIN indexes, removing/re-adding indexes during reloads, etc.)

Sorry if I'm being a downer. ;)  I don't mean to be a wet blanket, but
I don't want to set EG up to get a reputation for "failing" at
something for which it wasn't really meant, if that makes sense.

--miker

> don
>
>
>
>
> >>> mrylander at gmail.com 02-Aug-2007 1:37 PM >>>
>
> [snip]
>
> Hope that helps in the future, and thanks for the idea!
>
> --miker
>
>
>


-- 
Mike Rylander
Equinox Software, Inc
miker at esilibrary.com
http://esilibrary.com/


More information about the Open-ils-dev mailing list