[Evergreen-dev] Record Loading Speed
Jason Stephenson
jason at sigio.com
Fri Aug 6 20:14:50 EDT 2021
Hi, Michele!
My replies to specific points of yours appear under the relevant text below.
On 8/6/21 6:03 PM, Morgan, Michele via Evergreen-dev wrote:
> We've been looking at record loading speed, specifically records with
> located uri's. We do a big business in electronic resources, and expect
> it will only increase.
>
> We currently load the records using Vandelay via the client. I know
> there are scripts available that others have kindly shared and we're
> looking at scripting the loads.
Most of our loads are done by staff using Vandelay. We do have a script
to load large batches of records with located URIs that we use when
staff request it. I have shared a link to it in IRC several times. I
will include it here for the sake of anyone who missed it and is
interested in having a look:
https://pastebin.com/g4RGDJLr
>
> We've been monitoring the vandelay session tracker in the database and
> have found that records with 1 uri take on average 2 seconds to load, 3
> uris take on average 4 seconds, 10 uris take on average 12 seconds. We
> are running postgres 9.6.
I have found that, depending on your database hardware and
configuration, it can take two seconds to update a bib record regardless
of whether or not it has located URIs. This seemed to improve in 2018
when we bought super fast hardware.
Authority updates, for authorities with many linked bibs, can take even
longer.
In general, upgrading PostgreSQL to Pg 11 or later improves the
performance of authority updates markedly. Upgrading PostgreSQL seems
to have less impact on bibliographic record updates and appears to have
a detrimental effect on updating or inserting bib records with located URIs.
PostgreSQL 10 seems to perform at a level comparable to PostgreSQL
version 9.6 for most things. In some areas, it seems to make a slight
improvement.
>
> I know Jason Stephenson is testing newer versions of postgres and
> finding the loading even slower.
>
> There are so many database triggers that run when a marc is updated, and
> we are wondering if efficiency could be improved in some of the
> functions. Does anyone have ideas about avenues to make the processing
> more efficient?
At CW MARS, we are using a modified version of the script at the link
that I shared above to test the speed of updating and inserting records
with located URIs on the various releases of PostgreSQL. This is being
done in conjunction with a PostgreSQL database extension called
plprofiler (https://github.com/bigsql/plprofiler). The extension will
gather statistics on how long the process spends in each database
function, and a report can then be generated to pinpoint those areas
that require attention.
The actual profiling of a load began yesterday morning with PostgreSQL
version 10. As it takes days to load 12,567 records on the hardware
that I have for testing, I anticipate that I will be able to do 1 or 2
test loads per week, at most.
We tried PostgreSQL 12 on our training server briefly toward the
beginning of the year. We found additional places where Evergreen
performance seemed to be negatively impacted by the newer database
version, with at least one of these being noticeable in the web staff
client. (I am sorry to say that I do not remember the specific
interface that was affected at this time, and I've had a difficult time
finding it in my emails, tickets, or notes.) Suffice it to say that
more areas of Evergreen functionality should be tested with more recent
releases of PostgreSQL before we can recommend anyone use a version more
recent than Pg 10. As I have pointed out elsewhere, there is some
urgency for this work since PostgreSQL community support for 9.6 ends in
November 2021 and support for 10 end in November 2022.
See:
http://list.evergreen-ils.org/pipermail/evergreen-dev/2021-July/000172.html
I endeavor to document and coordinate work with others in a shared
Google folder:
https://drive.google.com/drive/folders/1sRZ8P1RHCOcZx42DxUehvOpoNkJJnfqn
I highly encourage anyone who wants to see Evergreen's database
performance improve over the next year to get involved.
Cheers,
Jason
More information about the Evergreen-dev
mailing list