[OPEN-ILS-DEV] RSS Feed (SuperCat/unapi) for New / Modified / Deleted

Dan Scott dan at coffeecode.net
Wed Oct 6 15:21:53 EDT 2010


Hey George:

On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote:
> Hello,
> 
> Context: we are feeding our catalogue records to our institutional search engine (Autonomy) using the dublin core XSL transformation to supply the indexable record content. We first supply a list of record ids that constitute the scope of records we want to supply to the search engine, but next step is to automate indexing so that we're not always re-crawling the entire database. The idea is that we'd re-crawl the entire catalogue when required for re-optimizing the Autonomy indexes (say monthly?), but supply a new / modified / deleted updated record listing by RSS or separate txt file posted somewhere crawlable (the latter of which I can do now).
> 
> Anybody have any thoughts on structuring an RSS feed for this purpose (that provides some generalized capabilities for other possible consumers of this this type of feed)? 

Yep, the wiki has documented this (if I understand what you want
correctly) for quite some time now:
http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:examples#return_a_feed_of_recently_edited_or_created_records

> For example, would you see a separate feed for updated vs. modified vs. new records or single feed with some fields like "status" (updated / deleted / added, etc.) and date status changed, etc.?  And how would you suggest we can hook into the arbitrary variable 'benchmark date' (in this case, the last full crawl) from which to determine the relative status changes like updated / deleted / added etc.?

For the most recent 10 records in MARCXML format which have been edited
since 2010-10-01:

http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/edit/10/2010-10-01

Similarly, for the most recent 10 records in MARCXML format which have
been created since 2010-10-01:

http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/import/10/2010-10-01

Substitute "rss2" for "marcxml" and you'll have a generalized feed,
albeit with far less metadata.

We're missing a deleted feed, though, but that should be pretty easy to
create.

So assuming your indexer can remember the last time it crawled the feed,
it should be able to supply the date to these feeds and gobble up data
accordingly.
 
> I can see another general use case for this type of feed being relevant to those who contribute records to external union catalogues - giving our union catalogue partners an additional option for scooping up our records with their own automation.

One must be very optimistic that the union catalogue partners would
actually adopt this approach, as welcome as it would be!


More information about the Open-ils-dev mailing list