[OPEN-ILS-GENERAL] Reporting module appears stuck
JonGeorg SageLibrary
jongeorg.sagelibrary at gmail.com
Sun Jun 2 22:18:27 EDT 2019
Ok, thank you. Our old internal documentation was incomplete. I will be
adding what you’ve provided for next time a report gets stuck. I made a
typo in testing for stuck reports, your sql queries worked like a charm -
thank you. Truly. Have a fabulous day.
-Jon
On Sun, Jun 2, 2019 at 12:08 PM Jason Stephenson <jason at sigio.com> wrote:
> At this point, you are beyond my knowledge of the reporter. However, it
> sounds like you stopped two reports while they were running. That's
> generally not a good idea.
>
> On 6/2/19 1:21 PM, JonGeorg SageLibrary wrote:
> > Thank you so much for responding and that script is essentially what I
> > was looking for, as I knew there had to be a way to view stuck reports.
> >
> > The first time I ran *pgrep -af Clark* it returned 2 report names.
> > However, since restarting Clark that first time after killing those two
> > processes I get nothing but the Clark, waiting for trouble output. Yes,
> > I removed the reporter-lock folder when restarting Clark and did
> > everything as the opensrf user.
> >
> > When I run *select * from reporter.currently_running* I get no output.
> > Just to double check I ran it on both the production and replicated
> > databases with the same result. However when I go to my version of the
> > staff client under reports, I show a stuck report from yesterday still
> > present in the queue.
> >
> > Ideas?
> > -Jon
> >
> > On Sun, Jun 2, 2019 at 5:34 AM Jason Stephenson <jason at sigio.com
> > <mailto:jason at sigio.com>> wrote:
> >
> > I sounds like you have dead reports that are preventing new reports
> from
> > starting. When a report dies or is killed, they aren't cleaned up
> and
> > Clark will think that they are still running.
> >
> > First, check if Clark is running and running any reports:
> >
> > pgrep -af Clark
> >
> > If you run that on the server where the reporter runs, you should get
> > output like this:
> >
> > 7180 Clark Kent, waiting for trouble
> >
> > The number is the process ID, so will be different. If any reports
> are
> > running, there will be additional lines similar to the above, but
> will
> > have some portion of the report's name:
> >
> > 7201 Clark Kent reporting: [Report Name]
> >
> > If no reports are currently running, then it is safe to do the
> following
> > steps.
> >
> > To check for dead reports, run the following query:
> >
> > select * from reporter.currently_running
> >
> > There can be up to "parallel" number of rows in that view, and when
> > there are that many, Clark will not start new reports. ("Parallel"
> is
> > the reporter/setup/parallel setting from opensrf.xml.)
> >
> > If you have any rows in that view, and no reports are currently
> running,
> > it is advisable to clear them out. You do that by setting the
> > complete_time on the listed reports. I have attached a SQL script
> that
> > I use for this purpose. It not only sets the complete_time, but also
> > sets the error_code and error_text to something semi-useful for our
> > environment. You might want to change that to suit your situation.
> >
> > HtH,
> > Jason
> >
> > On 6/1/19 6:12 PM, JonGeorg SageLibrary wrote:
> > > Greetings, I've run into an issue where the reporting module does
> not
> > > appear to want to restart.
> > >
> > > Reports are run on the log server against the replicated database
> > server.
> > > Normally what I do is:
> > >
> > > * just restart it
> > >
> > per
> http://docs.evergreen-ils.org/3.1/_starting_and_stopping_the_reporter_daemon.html
> as
> > > opensrf user
> > >
> > > I've also done the following:
> > >
> > > * Restarted all osrf services on the application and log servers
> > along
> > > with ejabberd/memcached where applicable.
> > > * Killed all processes on the database server older than 2
> minutes.
> > > * Re-ran replication of the production server to replicated
> database
> > > server. I did this just to rule out that there was not an
> > issue with
> > > the replicated copy because we did have a fines issue that was
> > > related to the replication at one point.
> > > * I ran "SELECT
> now()-query_start,pid,state,application_name,waiting
> > > FROM pg_stat_activity;" but had to remove ",waiting" as it
> > threw an
> > > error.
> > > o That returns a list of processes like open-ils.cstore,
> > > open-ils.pcrud, open-ils.reporter-store and the like. I
> > > attempted to kill the old reporter-store processes with the
> > > command "SELECT pg_cancel_backend(backend_pid);" and Clark
> > > stopped, and while it returned a value of true showing the
> > > process was dead, when I re-ran it, it appears to still be
> > present.
> > >
> > > I don't see anything else
> > >
> > under
> http://docs.evergreen-ils.org/reorg/3.1/command_line_admin/Evergreen_Documentation.pdf
> > >
> > or
> https://wiki.evergreen-ils.org/doku.php?id=scratchpad:random_magic_spells
> .
> > >
> > > The only thing I haven't tried, but shouldn't need to, is to
> actually
> > > restart that server, but am waiting until there is someone
> physically
> > > present in case it does not properly restart on its own.
> > >
> > > -Jon
> > >
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libmail.georgialibraries.org/pipermail/open-ils-general/attachments/20190602/c96505e4/attachment-0001.html>
More information about the Open-ils-general
mailing list