[OPEN-ILS-GENERAL] Evergreen 2.9 and PostgreSQL 9.4 - A Cautionary Tale
Chris Sharp
csharp at georgialibraries.org
Wed Feb 10 16:03:31 EST 2016
Thanks, Mike - I (and hopefully future generations of Evergreen admins) appreciate the clarifications!
Chris
----- Original Message -----
> From: "Mike Rylander" <mrylander at gmail.com>
> To: "Evergreen Discussion Group" <open-ils-general at list.georgialibraries.org>
> Sent: Wednesday, February 10, 2016 3:56:01 PM
> Subject: Re: [OPEN-ILS-GENERAL] Evergreen 2.9 and PostgreSQL 9.4 - A Cautionary Tale
>
> Chris,
>
> I was happy be able to perform that emergency diagnosis and
> immediately implement several of the changes you listed, and glad you
> were able to move quickly to implement my recommendations requiring
> restarts. I know it was a stressful time for your team, and it's
> frustrating when the things you've tried in the past don't help in new
> situations.
>
> I want to provide some context regarding the specific changes I made,
> and had you make, just so folks can make the best use of the
> information out there. Though I'm sure you know this, the details of
> the tuning any given Postgres instance are extremely dependent on the
> particular data set, even when you're using the same application
> (Evergreen in this case).
>
> With regard to the join collapse limit, there are many other data sets
> that do not show the same sensitivity to the setting as PINES. Most
> are well-served by the bug fix on
> https://bugs.launchpad.net/evergreen/+bug/1527731 . There is a
> not-insignificant cost to increasing this value, so there is a trade
> off to be weighed. In PINES case, the increase in large-query
> planning time was far outweighed by the cost of the very-slow queries
> being seen, of course. You'll notice that a much smaller increase
> described in the bug was all that was needed in Dan Wells' case,
> though even that was not necessary with the patch. An overly-high
> setting here can be detrimental to performance in general -- IOW,
> setting the join collapse limit as high as I've had you set it for
> PINES is definitely not a blanket optimization opportunity.
>
> For the ROWS adjustments, that can have a significant effect on plan
> creation (positive or negative) and is directly dependent on the data
> set. It was also critical in combination with the join collapse limit
> change, as otherwise row estimates in the PINES data set were still
> pushing plans in a sub-optimal direction. The specific setting of 300
> is only applicable to PINES. It feeds into the calculus that Postgres
> uses when constructing query plans, and must be tuned to the data set
> if it needs to be tuned at all. In many cases it does not need to be
> adjusted, and a high setting here can cause bad plans. While it does
> look like the value is essentially the number of org units in the
> instance, identifying the best value is not simply a matter of setting
> ROWS to approximately the number of org units you have -- there are
> situations (and configurations) where that isn't the correct thing to
> do, and it requires instance-specific evaluation to determine what's
> best.
>
> Indexes are also extremely data set dependent, of course, and their
> cost comes from the regular index maintenance costs. Obviously there's
> no need for multiple, identical indexes, so removing those is always a
> good recommendation if they are, indeed, duplicates. Adding an index
> is (relative to other tuning measures) pretty cheap if you have the
> drive space, but action.circulation is a pretty hot table (it changes
> a lot, and quickly), so unless there is a problem with timeouts to a
> site's collections vendor, I wouldn't recommend adding that specific
> index.
>
> For shared buffers, I can't think of a situation where I would
> recommend more than 32G, and even for PINES 16G is probably enough.
> However, since 32G is working well for you I wouldn't change it at
> this point. A large shared buffers setting causes a lot of waste in
> terms of RAM (pages are loaded twice, once in shared buffers and once
> in the OS page cache; wasted bookkeeping memory for the page map) and
> CPU time (locking stalls, CHECKPOINT timeout I/O storms, buffer
> eviction clock sweep work, load on the background writer, just to name
> a few). This is definitely one of those "every instance is different"
> things.
>
> Having said all that, disabling transparent huge pages is pretty much
> critical -- it's as important as disabling zone reclaim on any NUMA
> system. This was a non-issue on 9.2, became important in some
> situations on 9.3, and is a huge issue for 9.4. The reasons are
> complicated, but the cause is Postgres' change from System V to POSIX
> shared memory. Basically, just do it. :)
>
> Thanks for sharing, Chris, and again, I was glad to help when needed.
>
> --
> Mike Rylander
> | President
> | Equinox Software, Inc. / The Open Source Experts
> | phone: 1-877-OPEN-ILS (673-6457)
> | email: miker at esilibrary.com
> | web: http://www.esilibrary.com
>
>
>
> On Wed, Feb 10, 2016 at 2:41 PM, Chris Sharp
> <csharp at georgialibraries.org> wrote:
> > Hello all,
> >
> > As watchers of the Evergreen IRC channel, know, the PINES upgrade from
> > Evergreen 2.7.2/PostgreSQL 9.3 to Evergreen 2.9.1/PostgreSQL 9.4 was a bit
> > bumpy. Our system slowed to a crawl on go-live day, and we were quick to
> > determine that the culprit was the database. After consulting with Mike
> > Rylander of Equinox Software, we implemented several tweaks to our system
> > that I'll share here in hopes that they may help others in the future:
> >
> > The slowness appears to have been due to two problem areas:
> >
> > Bad Query Plans:
> >
> > - we upped the join_collapse_limit from the default value of 8 to 12 - Dan
> > Wells discusses this possibility in his bug report about slow copy load in
> > the OPAC (https://bugs.launchpad.net/evergreen/+bug/1527731).
> > - we altered the actor.org_unit_descendants, adding a realistic estimate of
> > the number of rows to be expected:
> > ALTER FUNCTION actor.org_unit_descendants(int) ROWS 300;
> > ALTER FUNCTION actor.org_unit_descendants(int,int) ROWS 300;
> > - we dropped a duplicate index on the action.hold_request table that was
> > apparently added to address an issue in the past
> > (ahr_current_copy_open_idx)
> > - we added an index to action.circulation to address an issue with slow
> > collections queries:
> > CREATE INDEX CONCURRENTLY circ_circ_lib_open_idx ON action.circulation
> > (usr,circ_lib) WHERE xact_finish IS NULL;
> >
> > PostgreSQL Memory Usage:
> >
> > - we reduced our shared_buffers size to 32GB from 128GB - we had calculated
> > the higher value from the PostgreSQL documentation
> > (http://www.postgresql.org/docs/9.4/static/runtime-config-resource.html#GUC-SHARED-BUFFERS)
> > recommendation of 1/4 of installed RAM (512GB in our case). We
> > immediately saw a reduction of resource consumption on the server.
> > - on the underlying OS, we turned off transparent huge pages (THP - see
> > http://unix.stackexchange.com/questions/99154/disable-transparent-hugepages
> > for details on how one might do this). Further research revealed that
> > turning off THP is generally recommended by Oracle and PostgreSQL.
> >
> > We also followed Mike's recommendation to run the hold targeter cron job
> > nightly rather than every 15 minutes, but that probably didn't have much
> > to do with the root cause (aside from adding queries to the pile-up when
> > slowness struck).
> >
> > Since we made the above changes, our database servers have increased speed
> > and are performing very happily. I just wanted to share our experience
> > with the Evergreen Community in hopes that others can benefit.
> >
> > --
> > Chris Sharp
> > PINES System Administrator
> > Georgia Public Library Service
> > 1800 Century Place, Suite 150
> > Atlanta, Georgia 30345
> > (404) 235-7147
> > csharp at georgialibraries.org
> > http://pines.georgialibraries.org/
>
--
Chris Sharp
PINES System Administrator
Georgia Public Library Service
1800 Century Place, Suite 150
Atlanta, Georgia 30345
(404) 235-7147
csharp at georgialibraries.org
http://pines.georgialibraries.org/
More information about the Open-ils-general
mailing list