[OPEN-ILS-DEV] RSS Feed (SuperCat/unapi) for New / Modified /Deleted

Duimovich, George George.Duimovich at NRCan-RNCan.gc.ca
Wed Oct 6 17:12:41 EDT 2010


Thanks Mike & Dan.

This looks great. 

The one possible glitch I might have: we are currently indexing all our content, but in the future, we may exclude some records from the crawl and any subsequent indexing. So we may have to work out a way to ensure any intended exclusions don't get introduced as new records from the feeds if they've been edited. 

Not too worried about this as the periodic full re-crawls would 'reset' the scope to the intended set of records. If I recall, there may be some shelf locations and/or call number ranges that our cataloguers would prefer not to make into the Autonomy crawl.

Cheers,
George

 

-----Original Message-----
From: open-ils-dev-bounces at list.georgialibraries.org [mailto:open-ils-dev-bounces at list.georgialibraries.org] On Behalf Of Mike Rylander
Sent: October 6, 2010 16:01
To: Evergreen Development Discussion List
Subject: Re: [OPEN-ILS-DEV] RSS Feed (SuperCat/unapi) for New / Modified /Deleted

On Wed, Oct 6, 2010 at 3:21 PM, Dan Scott <dan at coffeecode.net> wrote:
> Hey George:
>
> On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote:
>> Hello,
>>
>> Context: we are feeding our catalogue records to our institutional search engine (Autonomy) using the dublin core XSL transformation to supply the indexable record content. We first supply a list of record ids that constitute the scope of records we want to supply to the search engine, but next step is to automate indexing so that we're not always re-crawling the entire database. The idea is that we'd re-crawl the entire catalogue when required for re-optimizing the Autonomy indexes (say monthly?), but supply a new / modified / deleted updated record listing by RSS or separate txt file posted somewhere crawlable (the latter of which I can do now).
>>
>> Anybody have any thoughts on structuring an RSS feed for this purpose (that provides some generalized capabilities for other possible consumers of this this type of feed)?
>
> Yep, the wiki has documented this (if I understand what you want
> correctly) for quite some time now:
> http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:e
> xamples#return_a_feed_of_recently_edited_or_created_records
>
>> For example, would you see a separate feed for updated vs. modified vs. new records or single feed with some fields like "status" (updated / deleted / added, etc.) and date status changed, etc.?  And how would you suggest we can hook into the arbitrary variable 'benchmark date' (in this case, the last full crawl) from which to determine the relative status changes like updated / deleted / added etc.?
>
> For the most recent 10 records in MARCXML format which have been 
> edited since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio
> /edit/10/2010-10-01
>
> Similarly, for the most recent 10 records in MARCXML format which have 
> been created since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio
> /import/10/2010-10-01
>
> Substitute "rss2" for "marcxml" and you'll have a generalized feed, 
> albeit with far less metadata.
>
> We're missing a deleted feed, though, but that should be pretty easy 
> to create.
>
> So assuming your indexer can remember the last time it crawled the 
> feed, it should be able to supply the date to these feeds and gobble 
> up data accordingly.
>
>> I can see another general use case for this type of feed being relevant to those who contribute records to external union catalogues - giving our union catalogue partners an additional option for scooping up our records with their own automation.
>
> One must be very optimistic that the union catalogue partners would 
> actually adopt this approach, as welcome as it would be!
>

There is also an item-age axis for browse, which looks like:

http://catalogue.nrcan.gc.ca/opac/extras/browse/html/item-age/-/

That "-" near the end can be replaced by an Org Unit shortname to scope to a specific location, and "html" can be replaced by any unAPI format that's valid for bib records.

This is a little different from the bib feeds that Dan mentioned, but will allow you to crawl back through item additions as well as bib edits.

--
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com


More information about the Open-ils-dev mailing list