[OPEN-ILS-DEV] Improving Evergreen's OPAC Search

Tue Jan 12 13:45:38 EST 2010

On Tue, 2010-01-12 at 08:52 -0700, Chad G. Hansen wrote:
> Is this the current development community of Evergreen? Is this even
> the mailing list for those who are actively working on Evergreen?

Hello. Yes, here be code and coders.

> My group is submitting a proposal for a grant that will hopefully
> allow us to implement our new search technology with one of the
> existing open source ILS that are being used around the world. We
> would like to work with Evergreen to see if we can improve its OPAC
> search with ideas or features from our own system. We are just in the
> proposal phase right now. Attached is a portion of a previously
> rejected proposal. The section attached is our previous design/plan
> section for what we would like to accomplish.

I would suggest that assumptions like the outlandish MySQL / InnoDB
performance claims that seem to have been accepted at face value in the
proposal should be part of the research project - or called out as a
potential weakness of the results if you don't benchmark against other
alternatives like PostgreSQL.

I think the proposal would be stronger if it showed an understanding of
the current Evergreen search and relevancy algorithms, so that it would
be clear where the proposed system differs from the current Evergreen
system. Using Unicorn's relevancy as a benchmark is, well... it's just
cruel to the poor beast, as no site that I know expected anything good
from Unicorn's relevancy rankings (at least, not ca. GL3.1, which was my
last experience with it).

As a quick and very incomplete overview, Evergreen's search algorithm
uses PostgreSQL's full-text indexes, including Porter stemming. It does
not use stopwords, because real users search for titles like "It" or "To
Be Or Not To Be"; also, in French "or" and "thé" are quite meaningful.
The relevancy ranking is based on IDF but includes boosts for matching
keywords in the title or author, exact matches, etc. Searches can be
scoped to include results only from a single library or a subset of
libraries, and can be filtered to only return results with available
items of a specific material type / intended audience.

> What we are looking for is an endorement by the Evergreen development
> community of the idea. A willingness for someone in the development
> community to possibly work with us in our attempt to potentially
> improve Evergreen's search.

I'll add my name to the informal endorsements. I don't have the time to
personally commit to a long-term project like this, though.

> Our goal is to work with the development community to implement any
> improvements we can and then give what we have done to the community
> (we have no desire to maintain after the 3 year grant project is up).

That could be a problem. The approach mentioned by Galen and Joe of
incremental improvements along the way would be far easier for us to
adopt and maintain, versus a complete (and it would have to be complete)
alternative that would have no support from the originating developers
in the future.

> Is this the right venue for this request?

There's no better one.

> What is the concensus from this group?
> Do you need more information before commiting to endoring the idea to
> possibly work with us in the future (assuming our grant is granted)?

As others have said, you don't need our endorsement to work on this. If
you ask questions on the mailing list or IRC channel, you'll probably
get help getting to the answers.