[OPEN-ILS-DEV] RSS Feed (SuperCat/unapi) for New / Modified / Deleted

Mike Rylander mrylander at gmail.com
Wed Oct 6 16:01:18 EDT 2010


On Wed, Oct 6, 2010 at 3:21 PM, Dan Scott <dan at coffeecode.net> wrote:
> Hey George:
>
> On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote:
>> Hello,
>>
>> Context: we are feeding our catalogue records to our institutional search engine (Autonomy) using the dublin core XSL transformation to supply the indexable record content. We first supply a list of record ids that constitute the scope of records we want to supply to the search engine, but next step is to automate indexing so that we're not always re-crawling the entire database. The idea is that we'd re-crawl the entire catalogue when required for re-optimizing the Autonomy indexes (say monthly?), but supply a new / modified / deleted updated record listing by RSS or separate txt file posted somewhere crawlable (the latter of which I can do now).
>>
>> Anybody have any thoughts on structuring an RSS feed for this purpose (that provides some generalized capabilities for other possible consumers of this this type of feed)?
>
> Yep, the wiki has documented this (if I understand what you want
> correctly) for quite some time now:
> http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:examples#return_a_feed_of_recently_edited_or_created_records
>
>> For example, would you see a separate feed for updated vs. modified vs. new records or single feed with some fields like "status" (updated / deleted / added, etc.) and date status changed, etc.?  And how would you suggest we can hook into the arbitrary variable 'benchmark date' (in this case, the last full crawl) from which to determine the relative status changes like updated / deleted / added etc.?
>
> For the most recent 10 records in MARCXML format which have been edited
> since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/edit/10/2010-10-01
>
> Similarly, for the most recent 10 records in MARCXML format which have
> been created since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/import/10/2010-10-01
>
> Substitute "rss2" for "marcxml" and you'll have a generalized feed,
> albeit with far less metadata.
>
> We're missing a deleted feed, though, but that should be pretty easy to
> create.
>
> So assuming your indexer can remember the last time it crawled the feed,
> it should be able to supply the date to these feeds and gobble up data
> accordingly.
>
>> I can see another general use case for this type of feed being relevant to those who contribute records to external union catalogues - giving our union catalogue partners an additional option for scooping up our records with their own automation.
>
> One must be very optimistic that the union catalogue partners would
> actually adopt this approach, as welcome as it would be!
>

There is also an item-age axis for browse, which looks like:

http://catalogue.nrcan.gc.ca/opac/extras/browse/html/item-age/-/

That "-" near the end can be replaced by an Org Unit shortname to
scope to a specific location, and "html" can be replaced by any unAPI
format that's valid for bib records.

This is a little different from the bib feeds that Dan mentioned, but
will allow you to crawl back through item additions as well as bib
edits.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list