[OPEN-ILS-DEV] Open-ils-dev Digest, Vol 60, Issue 22

Dan Scott dan at coffeecode.net
Wed Mar 23 17:10:14 EDT 2011


Hi Ashish:

On 23 March 2011 16:12, Ashish Mukherjee <ashish.mukherjee at gmail.com> wrote:
> Hello Everyone,
>
> I came across the Google Summer of Code project ideas and was considering
> taking up the task of providing Full-Text Search. Though new to the project
> yet, would be glad if I could contribute to it.

Thanks for your interest in the Evergreen project!

> I have a bunch of related questions -
>
> 1) What's the codebase we'll be developing on - 1.6 or 2.x?

We don't add new features to released versions of the code, so you'll
be looking at developing against Evergreen trunk (sometimes called
HEAD in other version control systems) for a new feature like this.

> 2) Do we have a deployment architecture diagram somewhere? For instance, do
> we have info somewhere of how many servers we have, distribution etc.?

There isn't just one deployment of Evergreen. Some sites run
everything on a single server, others run Evergreen on clusters of
servers with replicated database servers, load balancing against
multiple Web servers, etc. Production distributions range from Debian
to Ubuntu to Red Hat (I develop on Fedora as my primary distribution).

> 3) Do we have metrics related to usage/load and the like, based on which we
> can test scalability of our full-text search feature?

Mmm, not so much. Again, it depends on where Evergreen has been
deployed; in a public library environment, it has significantly higher
load than in a small special library.

> 4) The GSoc Wiki page mentions Solr, Sphinx as examples of FTS engines.
> However, has any preliminary analysis already been done to determine what
> may be most suitable or are we leaning toward any particular choice for any
> reason at this point?

We currently use PostgreSQL's integrated full-text search; that would
be the baseline for any external search engine that might be brought
into play. Solr and Sphinx are mentioned because they have been used
in a number of other projects; Solr in particular is heavily used in
library contexts.

> 5) I see client and library code provided in different languages
> (java,perl,python) for the ILS. Is development of client code in all these
> languages within the scope of this project, or is it just the server which
> sends out a Web Service response?

Most of the current Evergreen code is written in Perl, with some C
used for performance-critical operations. Some plperlu and plpgsql is
also used to implement functionality in the PostgreSQL database.
Whether Java or Python or any other language would be suitable for the
project depends on what the rationale would be for choosing that
language, I think.

> 6) Are there any other design considerations or constraints or any other
> traps and pitfalls the developer to go ahead?

Heh, undoubtedly! It might be difficult to tease apart some of the
implicit dependencies on the current search-within-PostgreSQL
implementation without introducing significant regressions. And there
are the ever-present challenges of synchronizing changes in the
database with the external search engine. I'm sure you can think of
some other pitfalls that can arise when searches start to span both
metadata-only records and full text digital objects, if that's the
direction you're thinking of proposing.


More information about the Open-ils-dev mailing list