[OPEN-ILS-DEV] ***SPAM*** Preventing duplicate entries in bookbags / record buckets

Dan Scott dan at coffeecode.net
Fri Dec 18 10:58:54 EST 2009


We've been using bookbags for our course reserves lists, and it's mostly
been going well - particularly using the option to expose the book bags
in a given skin and locale. I plan on adding the ability to set the
chosen scope, as well, so that we're not showing the holdings across
every library in the consortium. But that's an aside...

In the mean time, we discovered that some of our reserves bookbags had a
tremendous number of records in them. We don't have too many sadistic
profs who assign insane amounts of readings, though; in most cases it's
because there were duplicate records. Again, in most cases there were
just two copies of a given record, but in the extreme case we had a
bookbag with 57 duplicate copies of a single record. Bookbags are
instantiated in the container.biblio_record_entry_bucket_item table, in
case you don't have that on the tip of your fingers. This is the same
table used to implement record buckets.

(My assumption is that there is/was a bad interaction with the browser
"Back" button that causes the addition and re-addition of records to a
given book bag, but I haven't confirmed that yet. To be determined!)

I checked on IRC and Bill Erickson said that some sites use bookbags as
a "books I've read in a given order" list, and that therefore duplicates
needed to be allowed for the occasion that a person would read the same
book twice. Also, he mentioned the possibility of using a shared bookbag
as a kind of ratings infrastructure, something like '''38 people added
this record to the "Like" bookbag''' to generate ratings. I can
understand the former rationale - even though it seems like a bit of an
edge case - but the latter sounds a bit insane to me. (Sorry, Bill!)

Bill and Jason thought that having a checkbox for "allow duplicates" in
the bookbag interface while using the existing database table would be a
reasonable solution. In effect, the application would manage the data
constraints. Me, I'm an old school database guy who wants the database
to enforce constraints, so I would prefer to see a unique constraint
added to the container.biblio_record_entry_bucket_item table, and if
there's a requirement for a reading list that allows duplicate entries,
we could shunt that off to a separate table rather than trying to keep
everything as generic as possible at the database schema level.

In the short term, I don't think anyone's working on enhancing the
bookbag interface to include a dupes/no-dupes option, and I don't plan
on reworking the database schema and associated services to fit my
prejudices. No time. So there was no interest in adding the unique
constraint to Evergreen core, but I thought I would share it with the
list in case there are other sites who want to enforce a "no duplicate
records in book bags / record buckets" policy at the database level.

First, I removed all of the duplicate records from the bookbags with the
following SQL:

DELETE FROM container.biblio_record_entry_bucket_item
  WHERE id NOT IN (
    SELECT MAX(id) 
      FROM container.biblio_record_entry_bucket_item AS dup 
      GROUP BY dup.bucket, dup.target_biblio_record_entry
  )
;

I then created a unique constraint across the combination of bucket +
record to prevent this from occurring in the future:

ALTER TABLE container.biblio_record_entry_bucket_item 
  ADD CONSTRAINT item_once_per_bucket 
    UNIQUE (bucket, target_biblio_record_entry)
;

Note of course that as this isn't part of Evergreen core, there's no
guarantee that this won't cause some problems for you down the road if
some social capabilities do end up being implemented on top of the
existing bookbag infrastructure. Caveat emptor, or so I've heard :)




More information about the Open-ils-dev mailing list