[OPEN-ILS-DEV] Feature Proposal: Additional Language Search Capability

Mon Nov 5 13:07:14 EST 2012

On Mon, Nov 5, 2012 at 12:06 PM, David Boyle <dandd.db at gmail.com> wrote:
> Proposed feature is to add an additional MARC tag and subfields for language
> searching.
> Currently, Evergreen only allows the language specified in MARC 008 position
> 35-37 to be the target of a language search.
> There is no current mechanism to allow for additional languages to be
> included as search targets.
> The proposed change would allow a configurable string to be set in the
> database, specifying which subfields in MARC 041 records would be allowed as
> additional search targets
>
> Blueprint:
> https://blueprints.launchpad.net/evergreen/+spec/additional-language-search
>
> Evergreen Wiki:
> http://www.open-ils.org/dokuwiki/doku.php?id=dev:proposal:additional_search_languages

Thanks for posting the idea!

I ran across it over the weekend via the RSS feed for the wiki, so I
had a chance to think about it before today ...

First, I wonder if you've considered extending or expanding some of
the more modern parts of the search/indexing machinery instead of
using metabib.full_rec (MFR).  There are some big-ish drawbacks to MFR
(non-configurable normalization, huge size, loss of MARC-level field
granularity, and most of all, performance (or, lack thereof)) that
make it less than optimal for general use -- I'd personally lobby to
see it go away entirely, if possible, but for its aid in
troubleshooting and low-level data analysis -- and it's extremely
MARC-centric, of course.

In particular, for alternate ideas, I'd point you towards the Single
Value Fields (config.record_attr_definition, metabib.record_attr, and
friends) infrastructure as inspiration for a Multi-Value Fields
implementation, to which the current language() (and item_lang())
filter could be moved.  There would certainly be differences, of
course -- data might be stored in a something other than HSTORE to
avoid indexing and query complications, etc, or maybe not -- but the
setup, extraction and normalization bits could be essentially the same
as SFV.  IOW, you'd benefit from code reuse.

However, the bigger benefit to this route, IMO, is that it would be
generalized and could help solve other outstanding issues.  For
instance, targeted indexing of other fields that are often singular,
except when they're not (and that's when you care most about them, it
seems), such as ISxN would be covered by this.  That would make record
import matching much faster (see the discussion of multi-value fields
on the Launchpad bug at
https://bugs.launchpad.net/evergreen/+bug/1024095 for some
background), and is but one example of what a more generalized
solution might be able to do.

Thoughts?

-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker at esilibrary.com
 | web:  http://www.esilibrary.com