[OPEN-ILS-GENERAL] Reporting module appears stuck

Sun Jun 2 22:18:27 EDT 2019

Ok, thank you. Our old internal documentation was incomplete. I will be
adding what you’ve provided for next time a report gets stuck. I made a
typo in testing for stuck reports, your sql queries worked like a charm -
thank you. Truly. Have a fabulous day.
-Jon

On Sun, Jun 2, 2019 at 12:08 PM Jason Stephenson <jason at sigio.com> wrote:

> At this point, you are beyond my knowledge of the reporter.  However, it
> sounds like you stopped two reports while they were running.  That's
> generally not a good idea.
>
> On 6/2/19 1:21 PM, JonGeorg SageLibrary wrote:
> > Thank you so much for responding and that script is essentially what I
> > was looking for, as I knew there had to be a way to view stuck reports.
> >
> > The first time I ran *pgrep -af Clark* it returned 2 report names.
> > However, since restarting Clark that first time after killing those two
> > processes I get nothing but the Clark, waiting for trouble output. Yes,
> > I removed the reporter-lock folder when restarting Clark and did
> > everything as the opensrf user.
> >
> > When I run *select * from reporter.currently_running* I get no output.
> > Just to double check I ran it on both the production and replicated
> > databases with the same result. However when I go to my version of the
> > staff client under reports, I show a stuck report from yesterday still
> > present in the queue.
> >
> > Ideas?
> > -Jon
> >
> > On Sun, Jun 2, 2019 at 5:34 AM Jason Stephenson <jason at sigio.com
> > <mailto:jason at sigio.com>> wrote:
> >
> >     I sounds like you have dead reports that are preventing new reports
> from
> >     starting.  When a report dies or is killed, they aren't cleaned up
> and
> >     Clark will think that they are still running.
> >
> >     First, check if Clark is running and running any reports:
> >
> >         pgrep -af Clark
> >
> >     If you run that on the server where the reporter runs, you should get
> >     output like this:
> >
> >         7180 Clark Kent, waiting for trouble
> >
> >     The number is the process ID, so will be different.  If any reports
> are
> >     running, there will be additional lines similar to the above, but
> will
> >     have some portion of the report's name:
> >
> >         7201 Clark Kent reporting: [Report Name]
> >
> >     If no reports are currently running, then it is safe to do the
> following
> >     steps.
> >
> >     To check for dead reports, run the following query:
> >
> >         select * from reporter.currently_running
> >
> >     There can be up to "parallel" number of rows in that view, and when
> >     there are that many, Clark will not start new reports.  ("Parallel"
> is
> >     the reporter/setup/parallel setting from opensrf.xml.)
> >
> >     If you have any rows in that view, and no reports are currently
> running,
> >     it is advisable to clear them out.  You do that by setting the
> >     complete_time on the listed reports.  I have attached a SQL script
> that
> >     I use for this purpose.  It not only sets the complete_time, but also
> >     sets the error_code and error_text to something semi-useful for our
> >     environment.  You might want to change that to suit your situation.
> >
> >     HtH,
> >     Jason
> >
> >     On 6/1/19 6:12 PM, JonGeorg SageLibrary wrote:
> >     > Greetings, I've run into an issue where the reporting module does
> not
> >     > appear to want to restart.
> >     >
> >     > Reports are run on the log server against the replicated database
> >     server.
> >     > Normally what I do is:
> >     >
> >     >   * just restart it
> >     >
> >      per
> http://docs.evergreen-ils.org/3.1/_starting_and_stopping_the_reporter_daemon.html
>  as
> >     >     opensrf user
> >     >
> >     > I've also done the following:
> >     >
> >     >   * Restarted all osrf services on the application and log servers
> >     along
> >     >     with ejabberd/memcached where applicable.
> >     >   * Killed all processes on the database server older than 2
> minutes.
> >     >   * Re-ran replication of the production server to replicated
> database
> >     >     server. I did this just to rule out that there was not an
> >     issue with
> >     >     the replicated copy because we did have a fines issue that was
> >     >     related to the replication at one point.
> >     >   * I ran "SELECT
> now()-query_start,pid,state,application_name,waiting
> >     >     FROM pg_stat_activity;" but had to remove ",waiting" as it
> >     threw an
> >     >     error.
> >     >       o That returns a list of processes like open-ils.cstore,
> >     >         open-ils.pcrud, open-ils.reporter-store and the like. I
> >     >         attempted to kill the old reporter-store processes with the
> >     >         command "SELECT pg_cancel_backend(backend_pid);" and Clark
> >     >         stopped, and while it returned a value of true showing the
> >     >         process was dead, when I re-ran it, it appears to still be
> >     present.
> >     >
> >     > I don't see anything else
> >     >
> >     under
> http://docs.evergreen-ils.org/reorg/3.1/command_line_admin/Evergreen_Documentation.pdf
> >     >
> >     or
> https://wiki.evergreen-ils.org/doku.php?id=scratchpad:random_magic_spells
> .
> >     >
> >     > The only thing I haven't tried, but shouldn't need to, is to
> actually
> >     > restart that server, but am waiting until there is someone
> physically
> >     > present in case it does not properly restart on its own.
> >     >
> >     > -Jon
> >     >
> >     >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20190602/c96505e4/attachment-0001.html>