[OPEN-ILS-GENERAL] Evergreen access via Google?

Ben Shum bshum at biblio.org
Thu Apr 9 08:52:00 EDT 2015


Hi Don,

Starting as recently as Evergreen 2.6 (it's noted on the Evergreen 2.6
release notes under "structured data" -
http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_6.html),
efforts were made by developers like Dan Scott to add structured data
elements to Evergreen's catalog to make them more discoverable.  This
work has continued throughout newer Evergreen releases and I'd like to
say that through Dan's work and others, it has been essential towards
keeping Evergreen's catalog more friendly to search engines, like
Google, etc.

Evergreen 2.8's release notes include lots more discoverability
enhancements added with that release too:
http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_8.html#_opac

Since your site does not include a manually configured robots.txt
file, I'll point you at an example set at Dan's library Laurentian
University's catalog:  https://laurentian.concat.ca/robots.txt  (we
based many of our changes following the example they set).

That robots.txt file tends to guide search engine bots that arrive at
the catalog towards indexing the appropriate contents, and avoid/skip
over certain undesirables.

By default, if you do not have anything set, then search engine bots
will likely attempt to index everything in your catalog that it can
publicly access.

Doing an example search like
https://www.google.com/#q=asbury+catalog+Star+Trek (aka, keywords in
Google for "asbury catalog Star Trek" I can already see a couple
results that come from your Evergreen catalog records.  So at least
Google's search engine bots are already working to grab your catalog's
contents.

That all said, I suppose one potential "danger" of having bots freely
scan over your site is that if they get too busy with indexing your
site's contents, they can overwhelm and cause interruptions in your
ability to use Evergreen.  This happened to us at least once before,
where some indexer in China scanned our whole catalog and tried to
index every page causing us to run out of system resources trying to
serve up all the content it was requesting.

For myself and Bibliomation's catalog, I've been experimenting with
modifying our robots.txt file and continually upgrading our Evergreen
catalog to reflect the latest enhancements for structured data to try
making the most use out of what's possible in Evergreen.  Proceeding
forward, I've also done some small experiments in creating Google
Custom Search Engines to search against our indexed online catalog
(and requesting scheduled indexing from Google's bots) as an
alternative means of discovering the content contained in our systems.

Moving forward, I expect this to continue to be an exciting area to
explore the ways of improving discoverability of Evergreen's content.

-- Ben

On Thu, Apr 9, 2015 at 8:15 AM, Donald Butterworth
<don.butterworth at asburyseminary.edu> wrote:
> Hi everyone,
>
> I was asked to toss these questions out and get some perspectives.
>
> "What would it take to make the Evergreen catalog holdings available to
> generic search engines like Google, Bing, Yahoo and DuckDuckGo?" "Even if it
> is doable, is it a good idea?"
>
> The motivation behind these questions is a perception that the first attempt
> many students make to do research is through a general web search.
>
> Anybody have a comment?
>
> Don
>
> --
> Don Butterworth
> Faculty Associate / Librarian III
> B.L. Fisher Library
> Asbury Theological Seminary
> don.butterworth at asburyseminary.edu
> (859) 858-2227



-- 
Benjamin Shum
Evergreen Systems Manager
Bibliomation, Inc.
24 Wooster Ave.
Waterbury, CT 06708
203-577-4070, ext. 113


More information about the Open-ils-general mailing list