[Evergreen-reports] [Sales] Publication year average

Blake Graham-Henderson blake at mobiusconsortium.org
Mon Feb 19 12:18:06 EST 2024


Mike,

A new column is what I was thinking. I figured that I'd break something 
by converting the string column into an integer. Though, I didn't think 
it would have necessitated a new metabib field definition, because the 
results of the existing definition could be converted to numeric closer 
to the "Simple Record Extracts" end of the chain? Perhaps introduce the 
new column in one of the views up the DB view chain somewhere? Still 
referencing the original extraction def?

New column name: "Publication Year (numeric normalized)"
pubyear_int

-Blake-
Conducting Magic
Will consume any data format
MOBIUS

On 2/19/2024 11:06 AM, Mike Rylander wrote:
> Hrm... I traced back the date1 record attribute defintion, actually, 
> rather than the pubdate metabib field. It's important to note that 
> record attributes and metabib fields have /very/ different use cases, 
> ingest performance profiles, and configuration shapes.  What's most 
> important here is that metabib fields are primarily meant to support 
> search, and record attributes are primarily meant to support discrete 
> value display and sorting.  We should try to use a single value 
> (multi=false in the config table) record attribute here, rather than a 
> metabib field.
>
> The drawback with Date1 (as in, the data coming from the 008) is that 
> if you have really thin records the 008 may not exist. However I don't 
> think the risk is really high there -- the record attribute version of 
> pubdate comes from the 008 as well, and that is what we use as the 
> data for the publication date sort axis.  Oh! And, looking closer, 
> the pubdate attribute uses the "Number or NULL Normalize" index 
> normalizer (id=18), which is the second half of what I described 
> before -- I'd just forgotten it existed.  Adding index normalizer 19 
> in a position before number-or-null, and then setting up the view 
> stack to use that record attribute, could be all that's needed.
>
> So, I think the record attribute version of pubdate is actually the 
> best data source.
>
> One thing to consider is existing uses of whatever extant field we end 
> up wanting to make use of.  So, the Real Plan, IMO should do all that 
> -^ as a /new/ record attribute rather than hijacking an existing one, 
> nor should it use an existing metabib field (recall, those are about 
> searching rather exposing data for other things to use), and have it 
> land in a completely new column on the Simple Record Extracts 
> materialized view.  Then there's no chance of breaking existing 
> reports with a column datatype change.
>
> Thoughts on that?
>
> --
> Mike Rylander
> Research and Development Manager
> Equinox Open Library Initiative
> 1-877-OPEN-ILS (673-6457)
> miker at equinoxOLI.org
> https://equinoxOLI.org <https://equinoxOLI.org>
>
>
> On Fri, Feb 16, 2024 at 2:23 PM Blake Graham-Henderson 
> <blake at mobiusconsortium.org> wrote:
>
>     All,
>
>     Thanks for your considerate responses. What Mike said is the
>     conclusion
>     I had come to, and I was wondering if anyone else needs the
>     publication
>     year to be an actual number so that the reporter can do things like
>     average,min,max,etc. From the sounds of it, no one is currently using
>     the Evergreen reporter to produce such a thing (I don't see how you
>     could). I suppose no one is using an external program to make it
>     happen
>     (to meet collection reporting needs from the higher-ups)?
>
>     I agree with Mike, in that the best place to get the publication year
>     (right now) is the Simple Record Extracts, because it hunts it
>     down from
>     several places in the bib record. Walking it backwards:
>
>     reporter.materialized_simple_record ->
>     reporter.old_super_simple_record
>     -> metabib.wide_display_entry -> metabib.compressed_display_entry ->
>     metabib.flat_display_entry -> metabib.display_entry
>
>     Which is a trigger-created-table based upon the index definition
>     found
>     in config.metabib_field
>
>     one of those views is hardcoded to expect "pubdate" to exist in the
>     metabib_field definitions. Which exists with stock Evergreen
>     definitions. Which is:
>
>     "//mods33:mods/mods33:originInfo//mods33:dateIssued[@encoding="marc"]|//mods33:mods/mods33:originInfo//mods33:dateIssued[1]"
>
>     Decoding that is fun. Suffice it to say: the pubyear can come from
>     several places in the record, and I like that better than only
>     looking
>     in one place.
>
>     So, in conclusion, if a patch were written, I think it would be
>     smart to
>     piggy back on this logic. It might be fairly straightforward to
>     get the
>     first occurrence from the JSON string and cast it to an integer
>     (stripping out non-numeric characters first). That's where my
>     thoughts
>     are right now. I don't think we're going to be writing the patch
>     anytime
>     soon, just thinking through it with everyone.
>
>     If everyone agrees that this is something that Evergreen should have,
>     and we agree on the method, I might champion the bug and patch for
>     future meetings and releases!
>
>     -Blake-
>     Conducting Magic
>     Will consume any data format
>     MOBIUS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.evergreen-ils.org/pipermail/evergreen-reports/attachments/20240219/f62026df/attachment-0001.htm>


More information about the Evergreen-reports mailing list