[Evergreen-dev] Expanded Concerto Dataset Management

Bill Erickson berickxx at gmail.com
Mon Oct 25 10:39:37 EDT 2021


Hi,

We have a number of bugs open regarding expanding our Concerto test
dataset.  Here's the general bug for reference:
https://bugs.launchpad.net/evergreen/+bug/1901932

As part of the project, a group of intrepid Evergreeners have been adding
data to a demo system.  I'm writing today to discuss options for using the
data we're collecting so that we can build EG systems with a nice pile of
sample data.

This is mostly a brain dump so we can discuss more in the hackaway.

The challenges:

The current concerto data is managed as a collection of SQL INSERTs, etc.
These inserts sit atop the baseline EG seed data (org units, user groups,
permissions, print templates, etc.).  This new dataset is created by hand
via the UI.  Naturally, this allows more people to contribute, since any
number of people can be logged in adding data and you don't have to arrange
it all into SQL files.  The downside is we have a bunch of data in a
database, but no SQL to manage it.

We can export the data with pg_dump.  However, we can't just export the
data from the demo system, then import the data into a stock system,
because it includes additions and modifications to the stock data (e.g. org
units, user groups, etc.).  A pg_restore on the exported data will fail
colliding with data that's added by the stock database install.  (E.g. you
can't have two org units with ID 1).

Here's what I'm doing now to create / test backups of the work in progress:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=blob;f=NOTES;h=93b72733031aa32eaa6df863faa331436c1cb72f;hb=97dedc3c5f39b56ee6295a1934089e5d5bd0e92a

In essence, export then import the full data set and the schema into a
blank Evergreen database.  This works, but it means including the database
schema in the export file.  If the schema is included in the seed data
file, it means we need an export file per database upgrade or at least per
Evergreen version.  (Note once the data+schema is installed, it can be
upgraded just like any EG database, so creating an export per EG change is
not strictly required).

Thoughts?  Maybe there's another way to export / import this data?

We could create SQL files from the sample data, similar to the existing
concerto data.  That would be a bit of a project.

Another option is extracting all the stock seed data into separate files
and only installing them when requested.  Then we could export the sample
data as data only and in theory it will be installable over the EG schema
in cases where the stock seed data is not installed.

Do a pg_dump of the data as SQL inserts then hack away at it until it works?

Ideally, whatever we choose will allows us to continue adding seed data
without it being super complicated to those adding the data.

If we chose the data+schema export/import option, where would we keep the
files?  Keeping one file per version/change in every EG branch would add a
lot of bulk.  We could keep one export file per branch / version within
its matching branch.  (E.g. rel_3_6_2 has a rel_3_6_2 export file, but no
others).  Keep all export files per version/change in a separate repository
so they all live in one place?

To be clear, I'm not talking about changing how we do Evergreen database
installs for typical installations.  These changes would only affect how we
build test systems with expanded data.

Let the discussion commence...

-b
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.evergreen-ils.org/pipermail/evergreen-dev/attachments/20211025/bced05eb/attachment.html>


More information about the Evergreen-dev mailing list