[OPEN-ILS-GENERAL] IMPORTANT: Postgres bug affecting Streaming Replication

Mike Rylander mrylander at gmail.com
Sat Nov 23 16:46:48 EST 2013


UPDATE THE SECOND:

The Postgres team is planning to wrap releases to fix this issue on
Dec 2, for release on Dec 5.  The PGDG apt repos should have the
package at that time, and I expect the debian and ubuntu repositories
to follow shortly thereafter.  Please mark your calendars, if you're
affected by this bug, and plan downtime to upgrade your servers ASAP
after the release.  For the announcement to the pgsql-hackers list
see: http://markmail.org/thread/qnye6rk37ug52zox

--miker


On Thu, Nov 21, 2013 at 3:55 PM, Mike Rylander <mrylander at gmail.com> wrote:
> UPDATE
>
> I need clarify and correct the information below: this is not
> restricted streaming-replication based hot-standby replicas, it is for
> all hot_standby-enabled replicas (regardless of the mechanism used to
> ship the WAL data) when there is a recovery.conf in place. This is
> per: http://markmail.org/message/p6bcdo5h2ll4kk3k
>
>
> On Thu, Nov 21, 2013 at 1:35 PM, Mike Rylander <mrylander at gmail.com> wrote:
>> THE BAD NEWS:
>>
>> If your site uses Postgres' streaming replication to create a
>> read-only hot-standby, such as for reporting or to support search
>> capacity needs, this message is for you.
>>
>> On Nov, 18, a bug was reported to the Postgres hackers list that can
>> cause extreme corruption of hot-standby secondary servers.  Known
>> affected versions include 9.2.5 and 9.3.1.  It is likely that this
>> goes as far back as 9.1.
>>
>> The nature of the bug is such that it is somewhat hard to provoke.
>> However, it can occur at any standby startup if the secondary is in
>> hot-standby mode, and not only during the initial secondary creation.
>> In other words, every restart of a hot-standby secondary instance of
>> Postgres is another chance to run afoul of this bug.  It is most
>> likely to occur under high write load or situations where there is
>> significant lag during standby startup.
>>
>> Symptoms include: missing data; apparently-duplicated data; mismatch
>> between index and heap data; "impossible" constraint violations
>> (unique and fkey).
>>
>> The underlying cause of these symptoms is a loss of proper transaction
>> visibility information on the hot-standby secondary.
>>
>> THE GOOD NEWS:
>>
>> There is an apparent fix being tested now by several large Postgres
>> entities.  It is expected that a point release for all affected
>> versions will be made available ASAP.  Additionally, if you have
>> already been bitten by the corruption bug, there is a straight-forward
>> procedure for recover, which I am sure will be detailed in the release
>> notes for the upcoming point releases.
>>
>> You can read the full thread from the pgsql-hackers mailing list here:
>> http://markmail.org/message/p54pqkykjtwvog3h
>>
>> --
>> Mike Rylander
>>  | Director of Research and Development
>>  | Equinox Software, Inc. / Your Library's Guide to Open Source
>>  | phone:  1-877-OPEN-ILS (673-6457)
>>  | email:  miker at esilibrary.com
>>  | web:  http://www.esilibrary.com
>
>
>
> --
> Mike Rylander
>  | Director of Research and Development
>  | Equinox Software, Inc. / Your Library's Guide to Open Source
>  | phone:  1-877-OPEN-ILS (673-6457)
>  | email:  miker at esilibrary.com
>  | web:  http://www.esilibrary.com



-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-general mailing list