[OPEN-ILS-GENERAL] Testing and Evergreen's quality (was: Database schema deprecation/supersedes stuff)

Mon Jun 6 01:34:45 EDT 2011

On Sun, Jun 5, 2011 at 11:17 AM, Mike Rylander <mrylander at gmail.com> wrote:
> On Sun, Jun 5, 2011 at 1:04 AM, Dan Scott <dan at coffeecode.net> wrote:

>>  * Are we ready to start making use of pgTAP? Changing the database
>> schema seems like a perfect use case for unit tests, to ensure that
>> expected behaviour is maintained through the upgrade, and to
>> demonstrate that buggy behaviour is fixed or non-existent behaviour
>> comes into existence via the upgrade.
>
> Ready? Sure. Tuit-ful? Not I...

I'm not sure how to respond to this tactfully, so I won't try to be
clever or cute, I'll just be blunt. The alternative to putting in time
upfront on quality is to spend more time addressing quality problems
later after a release, and we've done a lot of the latter. We've had
trouble publishing high quality initial releases. Production sites
have been finding too many problems with their patrons and staff, and
it's not good for the Evergreen name. My hands are far from clean on
this front (hello, sites who upgraded from 1.6 -> 2.0 and ran into
problems with authorities), which is one of the reasons that I have
invested much of my own time in getting the continuous integration
server running again and creating a skeleton set of unit tests (and
thanks to Kevin for his efforts in that area too). It's also why I've
been a proponent of getting sign-off on branches from another
contributor instead of committing your own work directly to a core
branch.

I believe that we can begin to address some of these quality issues
via more unit test coverage. I don't think that we're going to get
very far, though, if we just have one or two people trying to add unit
tests to other people's work - and those people are likely to have
their own areas of new functionality that they want to contribute to
Evergreen, rather than spending all of their time writing tests for
other people's code. The people creating new functionality or
modifying existing functionality are the ones who are in the best
position of knowing what inputs and outputs to expect from a given
chunk of code, and therefore to create basic unit tests demonstrating
those expectations - which helps other contributors weeks, months, or
years later know whether their own changes will break expectations.
But we need to adopt the approach as a team, not as individuals.
Tackling the database schema via pgTAP as modifications happen seems
like a small, reasonable step to take in this direction. It's not
trying to boil the ocean by saying that we need unit tests for every
function and every table in the database immediately; it's suggesting
that, when you modify the schema, you commit tests at the same time
that demonstrate that your changes do what you say they do (and
maintain existing behaviour). And eventually, I bet we would get a lot
of the database schema covered with this gradual approach.

Unit tests alone won't prevent all of the problems that we've run into
with new releases, of course. I've been guilty of introducing new
functionality that proved to perform poorly at scale until indexes
were added, or that only showed up when data was migrated from a
previous release rather than loaded directly into the new release.
Bug #788379 ("broad searches are slow") is an example of a serious
performance regression in 2.0 that has yet to be addressed.
constrictor gives us some great tools on the performance testing
front, but it takes time to set up a clean environment loaded up with
sufficient data to trigger noticeable performance problems (let alone
tracking performance over time) or to run that environment through an
upgrade process and put the resulting environment through its paces.
We need repeatable upgrade tests and performance tests - maybe a
community environment that runs a standard set of system tests on a
regular basis and tracks those results over time?

In summary, I don't think I'm the only person who feels that we've had
quality problems. There are probably ways to address these problems
that I haven't raised here, and I'd be happy to hear about
alternatives from people who are prepared to adopt them. I just don't
want to see a 2.1.0 release that isn't really ready for prime time
until 2.1.6, and a 3.0 release that isn't ready for adoption until
3.0.6, and I don't want libraries playing a game of chicken to see who
is willing to be the early adopter of a new release. I want libraries
confident that they can adopt a *.*.0 release, and I want them to be
proud of Evergreen's quality and able to recommend it without
reservations to other libraries.