[OPEN-ILS-DEV] Crawling Evergreen via SuperCat + Performance

Duimovich, George George.Duimovich at NRCan-RNCan.gc.ca
Thu Feb 18 10:33:34 EST 2010


Hello Jason,

Yes, some very, very interesting feedback from our search engine developer. As recap, we are doing a 'proof of concept' test of having our department's Autonomy search engine crawl our Evergreen records for incorporation into our general search of internal webpages, blogs, wikis, etc. and serving them up in a faceted interface. They'd like to add our EG Catalogue as another search source (with link back to rdetails in their results pages), so we began testing this week.  A great opportunity for us to expose our records in another search environment.

I didn't have anything in place for the first crawl, but I'm told that it took about 5 hours for 450,000 records (SuperCat dc format). This was using single threaded, no-delay crawl, and the only 'bottleneck' reported to me was whether Autonomy could index the records as fast as they could be served up (i.e. Autonomy robot client was not spending any time waiting around for Evergreen). I'll post on more details later, but developer is **very pleased** with the ease of integration thus far, including the crawl performance. Also, we're happy to start off with dc record indexing in this test as Autonomy is 'off the shelf' dublin core enabled.

I plan to run "iostat" and "vmstat" pre and during crawl for snapshot time period. There are some other tools out there for fancy graphs, but don't have time to look into much right now (e.g. http://mmonit.com/monit/ or http://sourceforge.net/projects/munin/). We don't have much experience with this kind of thing yet, so I really just want to get some basic indication as to how much impact the crawl has under business requirements load.  Also, I want at least one full blown crawl to run during business hours so we could get a more 'real-world' perspective on things. I think that'll happen as early as tomorrow afternoon. More later...

Thanks

George

George Duimovich
NRCan Library / Bibliothèque de RNCan




-----Original Message-----
From: open-ils-dev-bounces at list.georgialibraries.org [mailto:open-ils-dev-bounces at list.georgialibraries.org] On Behalf Of Jason Etheridge
Sent: February 18, 2010 09:14
To: Evergreen Development Discussion List
Subject: Re: [OPEN-ILS-DEV] Crawling Evergreen via SuperCat + Performance

On Tue, Feb 2, 2010 at 11:30 AM, Duimovich, George <George.Duimovich at nrcan-rncan.gc.ca> wrote:
> My question: I'll ask our search guy to keep some stats but also 
> wonder what are the best or recommended ways for monitoring 
> performance metrics under crawl conditions from the EG server side of 
> things. We can get some start/end times from the logs and draw some 
> conclusions that way, but any other advice would be helpful too.

George, did you learn anything worth sharing?  I know some work was being put into Constrictor in the Contrib repository for external stress testing, etc.

--
Jason Etheridge
 | VP, Tactical Development
 | Equinox Software, Inc. / The Evergreen Experts  | phone:  1-877-OPEN-ILS (673-6457)  | email:  jason at esilibrary.com  | web:  http://www.esilibrary.com

Please join us for the Evergreen 2010 International Conference!
It is being held April 20 - 23, 2010 at the Amway Grand Hotel and Convention Center, Grand Rapids, Michigan.
http://www.evergreen2010.org/


More information about the Open-ils-dev mailing list