[OPEN-ILS-GENERAL] Awesome Box Integration

Sat Sep 27 13:32:54 EDT 2014

Hello Everyone

It's very encouraging to see all your inputs. And I'd like to thank you for
that.

As Rogan mentioned, all the processing of data could take place in the
back-end. And only sysadmins(or other authorized personnels) can have
access to the data. The data can be given to the classifier in it's raw
form, or mapped to anonymous user IDs. Once the classifier returns the
result, the user IDs can be mapped back to the user, along with a list of
their preferences. And in the front end, the user can have the same
preferences listed (without any additional information of other patrons).

Also, I agree with Tim and Kathy, that initially, the Awesome Box project
must be focussed on collecting awesome tags. And only then, any additional
artificial intelligence algorithms must be implemented. Also, as Hardy
said, the previous data can be worked on and updated/cleaned (in case of
genre requirement). And the classifier can be trained (on test data), while
the initial foundation of the Evergreen-AwesomeBox integration is being
laid.

Maybe, if we manage to figure out a feasible workflow for the project, we
can achieve it within the OPW time-frame. A rough idea that I had in mind
was:
1. Understanding how Evergreen and AwesomeBox can be integrated efficiently.
2. Collecting awesome tags from AwesomeBox.
3. Finding out the weightage of the users' awesome tags and modifying the
further counts to be more interpretive.
4. Cleaning/updating the previous data, in order to train the classifier to
generate recommendations.
5. Extending the same to genres and providing the administrators the
flexibility of choosing the recommendation layer that they would prefer.

What do you think?

On Sat, Sep 27, 2014 at 1:09 AM, Galen Charlton <gmc at esilibrary.com> wrote:

> Hi,
>
> I lean toward's Tim's preference myself, but I think one of the things
> to keep in mind with the potential OPW projects is that they're all
> still in the brainstorming phase -- no idea should be off the table.
>
> Of course, it's unlikely that we'll have time to implement ALL of the
> good ideas during the course of the program, but that's exactly one of
> the things that the mentors will work with the candidates on --
> ensuring that the final plan is of realistic scope.
>
> Regards,
>
> Galen
>
> On Fri, Sep 26, 2014 at 12:28 PM, Tim Spindler <tjspindler at gmail.com>
> wrote:
> > Kathy,
> >
> > I think that's a good point.  I think Rogan and others have cautioned
> about
> > feature creep also.  I think in the end I would be happy to first to see
> > integration with Awesome Box and then as a second phase some of the other
> > issues.
> >
> > On Fri, Sep 26, 2014 at 3:05 PM, Kathy Lussier <klussier at masslnc.org>
> wrote:
> >>
> >> Hi all,
> >>
> >>> Basically, I wouldn't let the quality of genre headings in your catalog
> >>> determine whether Awesome Box uses genre headings. Too much in the
> >>> history
> >>> of genre use makes clean headings difficult. I would, however, begin
> >>> considering how to clean up those headings so Awesome Box could be
> fully
> >>> implemented.
> >>
> >>
> >> I just want to throw out a reminder that full implementation of "Awesome
> >> Box" is really collecting the data for items that have been returned to
> an
> >> awesome box in the library and sending that information along to
> >> http://awesomebox.io/. I think Vanya has some good ideas to then use
> that
> >> same data in Evergreen in other ways, which is great and may start a
> >> foundation for even more development. But, in my mind, these other
> >> components are gravy. Exciting gravy, but gravy nonetheless.
> >>
> >> Kathy
> >>
> >> Kathy Lussier
> >> Project Coordinator
> >> Massachusetts Library Network Cooperative
> >> (508) 343-0128
> >> klussier at masslnc.org
> >> Twitter: http://www.twitter.com/kmlussier
> >> #evergreen IRC: kmlussier
> >>
> >> On 9/26/2014 2:22 PM, Hardy, Elaine wrote:
> >>>
> >>> Genre headings can be corrected so that they are current to the
> thesauri
> >>> your library uses. LCGFT and GSAFD  authority records are available,
> for
> >>> example.  However, authorities for genre headings is  relatively recent
> >>> and,
> >>> as a result, many libraries did not retain or add genre headings to bib
> >>> records in the past. Of course, adding subject headings to fiction is
> >>> relatively recent as well. Some older fiction titles may just have
> genre
> >>> headings, if anything at all.
> >>>
> >>> Copy cataloging should not make a difference in whether headings are
> used
> >>> correctly or whether your library chooses to use genre headings.
> Although
> >>> I
> >>> suppose your bibliographic utility will. If you obtain most of your
> >>> records
> >>> from LC or OCLC, then certainly newer titles will have extensive genre
> >>> headings. With the advent of LCGFT, more catalogers do add genre
> headings
> >>> to
> >>> bib records. GSAFD use was spotty but has increased. What could make
> the
> >>> difference is whether you use vendor cataloging since your library
> might
> >>> have to pay extra for use and maintenance of genre headings.
> Particularly
> >>> if
> >>> you use the vendor as a source for your title records.
> >>>
> >>> If your catalogers are afforded the time to correct and add genre
> >>> headings,
> >>> then whether they copy catalog or create all title records originally
> >>> won't
> >>> matter. What their process and procedures are does.
> >>>
> >>> If your genre headings have not been kept up to date (which is likely
> >>> true
> >>> of all of us), then I suggest cleaning them up as much as possible if
> >>> Awesome box ratings will include them. And approaching cataloging staff
> >>> to
> >>> see if including use and maintenance of genre headings can become part
> of
> >>> their workflow. Keep in mind that, not only could it increase the time
> it
> >>> takes for items to get to the shelf, if you out source, it might
> increase
> >>> costs. If you use a vendor authority service, genre heading maintenance
> >>> may
> >>> already be a part of the service.
> >>>
> >>> I'm not sure that beginning with broad categories would solve any
> >>> problems
> >>> since anything other than literary form (fiction, nonfiction, poetry,
> >>> drama,
> >>> etc) is going to be in, or not, a 655. Again, whether LitF in the fixed
> >>> filed is coded properly depends on the quality of your bib records.
> Some
> >>> of
> >>> the prePINES records have very little coding of any kind in the fixed
> >>> fields -- about 200,000 out of 1.7 million or so bib records.
> >>>
> >>> Basically, I wouldn't let the quality of genre headings in your catalog
> >>> determine whether Awesome Box uses genre headings. Too much in the
> >>> history
> >>> of genre use makes clean headings difficult. I would, however, begin
> >>> considering how to clean up those headings so Awesome Box could be
> fully
> >>> implemented.
> >>>
> >>>
> >>> Elaine
> >>>
> >>> J. Elaine Hardy
> >>> PINES & Collaborative Projects Manager
> >>> Georgia Public Library Service
> >>> 1800 Century Place, Ste 150
> >>> Atlanta, Ga. 30345-4304
> >>>
> >>> 404.235-7128
> >>> 404.235-7201, fax
> >>> ehardy at georgialibraries.org
> >>> www.georgialibraries.org
> >>> www.georgialibraries.org/pines
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: open-ils-general-bounces at list.georgialibraries.org
> >>> [mailto:open-ils-general-bounces at list.georgialibraries.org] On Behalf
> Of
> >>> McCanna, Terran
> >>> Sent: Thursday, September 25, 2014 4:33 PM
> >>> To: Evergreen Discussion Group
> >>> Subject: Re: [OPEN-ILS-GENERAL] Awesome Box Integration
> >>>
> >>> This relies on the circulation and rating data still being tied to the
> >>> patron in the system, though - yes, it'd be on the database side and
> not
> >>> on
> >>> public view, but it's still creating a picture of a patron's reading
> >>> history
> >>> that has privacy implications. Of course, this feature should be set
> for
> >>> systems to enable or disable, so that systems that are concerned about
> >>> privacy simply won't turn it on. (PINES, for example, limits the
> >>> retention
> >>> of circulation history in the system as much as we can because of our
> >>> privacy policies, so any feature that is linked to a patron's history
> >>> would
> >>> be unusable for us.)
> >>>
> >>> If ranking data were stored completely independently of the patron,
> then
> >>> library systems would be able to use it without privacy concerns, and
> >>> patrons wouldn't even need to be logged in to use it  - but then it
> >>> wouldn't
> >>> be able to give completely customized recommendations to a specific
> >>> patron,
> >>> either. It's a definite tradeoff.
> >>>
> >>>
> >>> Terran McCanna
> >>> PINES Program Manager
> >>> Georgia Public Library Service
> >>> 1800 Century Place, Suite 150
> >>> Atlanta, GA 30345
> >>> 404-235-7138
> >>> tmccanna at georgialibraries.org
> >>>
> >>> ----- Original Message -----
> >>> From: "Vanya Jauhal" <vanyajauhal at gmail.com>
> >>> To: "Evergreen Discussion Group"
> >>> <open-ils-general at list.georgialibraries.org>
> >>> Sent: Thursday, September 25, 2014 3:41:02 PM
> >>> Subject: Re: [OPEN-ILS-GENERAL] Awesome Box Integration
> >>>
> >>>
> >>>
> >>> Hello Rogan
> >>>
> >>> This is exactly what I had in mind. All the recommendation processing
> >>> will
> >>> take place in background, and all the user will see is a recommendation
> >>> and
> >>> not the information of any other patron. This way his experience with
> >>> Awesome Box will get enhanced.
> >>>
> >>>
> >>> And yes, we can maybe, start off with some broad level genres, like, as
> >>> you
> >>> mentioned, fiction, non-fiction, documentaries, etc. Then, depending
> upon
> >>> the infrastructure of the system and the response of that
> categorization,
> >>> we
> >>> can build upon the algorithm accordingly.
> >>>
> >>>
> >>> You are right- it would be a big task in itself, but since the number
> of
> >>> parameters involved are few and explicit, it gets simplified to an
> >>> extent.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Sep 26, 2014 at 12:50 AM, Rogan Hamby <
> rogan.hamby at yclibrary.net
> >>> >
> >>> wrote:
> >>>
> >>>
> >>>
> >>> I don't see an issue with doing analysis of circulation patterns on the
> >>> backend so long as nothing identifying is exposed.
> >>>
> >>>
> >>> For example, if all I saw as a patron was a tab in my opac that said
> "you
> >>> thought The Yiddish Policeman's Union was Awesome! Some others do did
> >>> also
> >>> thought this was Awesome .... " I don't see that as different from
> doing
> >>> the
> >>> same thing with circulations. It's not telling patrons even what the
> >>> points
> >>> of comparison were unless they only had a single item in their
> >>> circulation
> >>> history and even then it doesn't tell them how many other patrons, how
> >>> much,
> >>> etc....
> >>>
> >>>
> >>> I'm dubious about subject headings also but wouldn't want to dismiss it
> >>> out
> >>> of hand. It might work. Without doing some experimenting I could see it
> >>> going either way. Some fixed fields I could see working, like fiction
> and
> >>> non-fiction. Age groups? Well, at least I can tell you I can't rely on
> >>> those
> >>> in my catalog. :)
> >>>
> >>>
> >>> However, I also worry that reading recommendations based on circulation
> >>> history could easily grow into a much more complicated task, especially
> >>> depending on how we deliver those recommendations. Looking at a single
> >>> boolean value tied to the user and item (circ table?) could still be
> >>> quite a
> >>> project by itself especially once all the useful bits and pieces are
> >>> built
> >>> in.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Sep 25, 2014 at 2:37 PM, McCanna, Terran <
> >>> tmccanna at georgialibraries.org > wrote:
> >>>
> >>>
> >>> Agreed - it's a great idea in theory, but I'm not sure how well it
> would
> >>> work in actual practice. Even in a single library, genre subject
> headings
> >>> are usually pretty inconsistent in the MARC records because of copy
> >>> cataloging, and that usually gets even more inconsistent in a
> consortium
> >>> of
> >>> libraries. Perhaps it could be partially weighted on genre subject
> >>> headings,
> >>> but not overly reliant on them? It might be worth considering the fixed
> >>> field values for fiction vs. non-fiction and for age groups, too.
> >>>
> >>> I love the idea of providing recommendations based on other people that
> >>> have
> >>> similar taste ("other people that liked this book also liked these
> >>> books...") but if the data is tied to actual patrons (and I'm not sure
> >>> how
> >>> it couldn't be) then quite a few library systems would face legal
> privacy
> >>> issues and wouldn't be able to use it. We're currently using a
> commercial
> >>> service to pull in reading recommendations because the recommendations
> >>> can't
> >>> be tied back to any of our patrons.
> >>>
> >>>
> >>> Terran McCanna
> >>> PINES Program Manager
> >>> Georgia Public Library Service
> >>> 1800 Century Place, Suite 150
> >>> Atlanta, GA 30345
> >>> 404-235-7138
> >>> tmccanna at georgialibraries.org
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: "Rogan Hamby" < rogan.hamby at yclibrary.net >
> >>> To: "Evergreen Discussion Group" <
> >>> open-ils-general at list.georgialibraries.org >
> >>> Sent: Thursday, September 25, 2014 2:02:58 PM
> >>> Subject: Re: [OPEN-ILS-GENERAL] Awesome Box Integration
> >>>
> >>>
> >>> I can see some challenges to tracking genre and I'd be hesitant to put
> >>> too
> >>> much value on it. There are ways to catalog it but in my experience
> >>> actually
> >>> relying on it being in records (much less being consistent) is very
> >>> unreliable in organizations that do a lot of copy cataloging / don't
> have
> >>> centralized and controlled cataloging and there quite a few in that
> boat.
> >>>
> >>>
> >>> That concern aside, I've always thought this would be a fun and
> >>> potentially
> >>> valuable thing to add.
> >>>
> >>>
> >>> On Thu, Sep 25, 2014 at 1:44 PM, Vanya Jauhal < vanyajauhal at gmail.com
> >
> >>> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Hello everyone
> >>>
> >>> I'm Vanya, from India. I'm a candidate for OPW Round9 internship with
> >>> evergreen.
> >>>
> >>> While discussing the idea of Awesome Box integration with Evergreen,
> >>> Kathy
> >>> and I discussed the possibility of making the Evergreen support for
> >>> Awesome
> >>> Box more interpretive using Artificial Intelligence.
> >>>
> >>> What if we could train the system to give weightage to people's
> "awesome"
> >>> tags on items, depending upon how much their previous tags are
> >>> appreciated
> >>> by other people.
> >>>
> >>> For example: Let's say you tag a book to be awesome. Now, if 100 other
> >>> people check that book in, and (lets say) 80 of them also tag it to be
> >>> awesome- it will mean that your opinion matches a majority of people.
> On
> >>> the
> >>> other hand, if 100 other people check that book in and (say) only 5 of
> >>> them
> >>> tag it as awesome, this would mean that your awesome tag is not in
> >>> coherence
> >>> with the majority.
> >>> So, in the former case, your awesome tag can be given more weightage as
> >>> compared to the latter.
> >>>
> >>> Also, the weightage may vary according to genres. So- you may have a
> good
> >>> taste in mystery books but your taste in classical literature might not
> >>> be
> >>> the same as the majority crowd. So- the weightage of your awesome tag
> in
> >>> mystery would be higher than classical literature.
> >>>
> >>> We can even extend it to provide recommendations to users depending on
> >>> their
> >>> coherence with other users with similar taste.
> >>>
> >>> I am looking forward to your suggestions and feedback on this.
> >>>
> >>> Thank you for your time
> >>>
> >>> Vanya
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Tim Spindler
> > tjspindler at gmail.com
> >
> > P   Go Green - Save a tree! Please don't print this e-mail unless it's
> > really necessary.
> >
> >
>
>
>
> --
> Galen Charlton
> Manager of Implementation
> Equinox Software, Inc. / The Open Source Experts
> email:  gmc at esilibrary.com
> direct: +1 770-709-5581
> cell:   +1 404-984-4366
> skype:  gmcharlt
> web:    http://www.esilibrary.com/
> Supporting Koha and Evergreen: http://koha-community.org &
> http://evergreen-ils.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20140927/0213e5aa/attachment-0001.htm>