[OPEN-ILS-GENERAL] IMPORTANT: Postgres bug affecting Streaming Replication

Mike Rylander mrylander at gmail.com
Thu Nov 21 13:35:07 EST 2013


If your site uses Postgres' streaming replication to create a
read-only hot-standby, such as for reporting or to support search
capacity needs, this message is for you.

On Nov, 18, a bug was reported to the Postgres hackers list that can
cause extreme corruption of hot-standby secondary servers.  Known
affected versions include 9.2.5 and 9.3.1.  It is likely that this
goes as far back as 9.1.

The nature of the bug is such that it is somewhat hard to provoke.
However, it can occur at any standby startup if the secondary is in
hot-standby mode, and not only during the initial secondary creation.
In other words, every restart of a hot-standby secondary instance of
Postgres is another chance to run afoul of this bug.  It is most
likely to occur under high write load or situations where there is
significant lag during standby startup.

Symptoms include: missing data; apparently-duplicated data; mismatch
between index and heap data; "impossible" constraint violations
(unique and fkey).

The underlying cause of these symptoms is a loss of proper transaction
visibility information on the hot-standby secondary.


There is an apparent fix being tested now by several large Postgres
entities.  It is expected that a point release for all affected
versions will be made available ASAP.  Additionally, if you have
already been bitten by the corruption bug, there is a straight-forward
procedure for recover, which I am sure will be detailed in the release
notes for the upcoming point releases.

You can read the full thread from the pgsql-hackers mailing list here:

Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com

More information about the Open-ils-general mailing list