[Evergreen-dev] Record Loading Speed

Jason Stephenson jason at sigio.com
Fri Aug 6 20:14:50 EDT 2021


Hi, Michele!

My replies to specific points of yours appear under the relevant text below.

On 8/6/21 6:03 PM, Morgan, Michele via Evergreen-dev wrote:
> We've been looking at record loading speed, specifically records with 
> located uri's. We do a big business in electronic resources, and expect 
> it will only increase.
> 
> We currently load the records using Vandelay via the client. I know 
> there are scripts available that others have kindly shared and we're 
> looking at scripting the loads.

Most of our loads are done by staff using Vandelay.  We do have a script 
to load large batches of records with located URIs that we use when 
staff request it.  I have shared a link to it in IRC several times.  I 
will include it here for the sake of anyone who missed it and is 
interested in having a look:

https://pastebin.com/g4RGDJLr

> 
> We've been monitoring the vandelay session tracker in the database and 
> have found that records with 1 uri take on average 2 seconds to load, 3 
> uris take on average 4 seconds, 10 uris take on average 12 seconds. We 
> are running postgres 9.6.

I have found that, depending on your database hardware and 
configuration, it can take two seconds to update a bib record regardless 
of whether or not it has located URIs.  This seemed to improve in 2018 
when we bought super fast hardware.

Authority updates, for authorities with many linked bibs, can take even 
longer.

In general, upgrading PostgreSQL to Pg 11 or later improves the 
performance of authority updates markedly.  Upgrading PostgreSQL seems 
to have less impact on bibliographic record updates and appears to have 
a detrimental effect on updating or inserting bib records with located URIs.

PostgreSQL 10 seems to perform at a level comparable to PostgreSQL 
version 9.6 for most things.  In some areas, it seems to make a slight 
improvement.

> 
> I know Jason Stephenson is testing newer versions of postgres and 
> finding the loading even slower.
> 
> There are so many database triggers that run when a marc is updated, and 
> we are wondering if efficiency could be improved in some of the 
> functions. Does anyone have ideas about avenues to make the processing 
> more efficient?

At CW MARS, we are using a modified version of the script at the link 
that I shared above to test the speed of updating and inserting records 
with located URIs on the various releases of PostgreSQL.  This is being 
done in conjunction with a  PostgreSQL database extension called 
plprofiler (https://github.com/bigsql/plprofiler).  The extension will 
gather statistics on how long the process spends in each database 
function, and a report can then be generated to pinpoint those areas 
that require attention.

The actual profiling of a load began yesterday morning with PostgreSQL 
version 10.  As it takes days to load 12,567 records on the hardware 
that I have for testing, I anticipate that I will be able to do 1 or 2 
test loads per week, at most.

We tried PostgreSQL 12 on our training server briefly toward the 
beginning of the year.  We found additional places where Evergreen 
performance seemed to be negatively impacted by the newer database 
version, with at least one of these being noticeable in the web staff 
client.  (I am sorry to say that I do not remember the specific 
interface that was affected at this time, and I've had a difficult time 
finding it in my emails, tickets, or notes.)  Suffice it to say that 
more areas of Evergreen functionality should be tested with more recent 
releases of PostgreSQL before we can recommend anyone use a version more 
recent than Pg 10.  As I have pointed out elsewhere, there is some 
urgency for this work since PostgreSQL community support for 9.6 ends in 
November 2021 and support for 10 end in November 2022.

See: 
http://list.evergreen-ils.org/pipermail/evergreen-dev/2021-July/000172.html

I endeavor to document and coordinate work with others in a shared 
Google folder:

https://drive.google.com/drive/folders/1sRZ8P1RHCOcZx42DxUehvOpoNkJJnfqn

I highly encourage anyone who wants to see Evergreen's database 
performance improve over the next year to get involved.

Cheers,
Jason



More information about the Evergreen-dev mailing list