[OPEN-ILS-DEV] Integrating automated code analysis tools into regular practice?

Fri Nov 23 10:26:52 EST 2007

On 22/11/2007, Dan Scott <denials at gmail.com> wrote:
> On 22/11/2007, Scott McKellar <mck9 at swbell.net> wrote:
> >
> > --- Dan Scott <denials at gmail.com> wrote:
> >
> > > I was pondering a few possibilities for helping to tighten up our
> > > code
> > > in a more automated fashion today. I thought I would throw the ideas
> > > out there to see if there's interest or support for them...
> >
> > <snip: info about code-scanning tools>
> >
> > Mod parent up.
> >
> > I have stumbled across various problems (security vulnerabilities,
> > memory leaks, etc.) in the course of doing other things, but I
> > haven't been looking for them systematically.  I've probably
> > overlooked more than I've found, even if you don't count the code
> > that I haven't looked at yet.  No doubt the automated tools can find
> > more stuff faster.  It's the difference between working with a pick
> > and shovel and working with a backhoe.
> >
> > Don't forget good old lint.  I haven't used it yet on this project
> > because I've been finding enough to keep me busy with the pick-and-
> > shovel approach.  However my experience is that lint usually finds
> > at least a few forehead-slappers.
> >
> > My one reservation is about the idea of posting the results on the
> > web.  The point is not that I don't want to air our dirty linens
> > in public -- it's all open source, after all -- but I wouldn't want
> > to erect needless barriers to the code scans.  If it takes ten
> > minutes to run the scans, and three hours to update the website, then
> > we probably won't run the scans as often as we should.  I'd rather
> > have more scans than prettier web pages.
> >
> > Hence I suggest that any publication of the scan results involve
> > minimal work, because the reporting is rudimentary, automated,
> > or both.
> >
> > I note that Coverity's website already publishes defect counts on the
> > projects it covers.
>
> Hi Scott:
>
> I wouldn't propose something that would take a lot of work - I'm lazy
> that way :)
>
> For tools like rats and Perl::Critic, we could simply run the tools
> over the entire tree of OpenSRF and Evergreen, pipe the output into a
> file, and post the file to the Web site. If we had an automated build
> machine, we could make that a standard part of each build. This is
> also where we would store the results of automated regression test
> runs, if we, well, had regression tests :)
>
> A moderately sophisticated approach would diff the output between
> builds and flag new warnings, to focus attention on regressions or new
> code that had been introduced.
>
> A more sophisticated approach would shove each possible violation into
> a database row, then let human eyes determine whether the warning was
> valid or not; if not, then a brief rationale could be entered and the
> warning could automatically be ignored in future runs (with some
> degree of fuzziness for line number drift) and future developers would
> be able to find a record of why that particular "violation" is
> actually quite intentional and necessary.
>
> But I would be happy with just the least sophisticated approach. The
> hard part is where things like Perl::Critic flag coding practices that
> verge towards style, e.g. coming down hard on the use of unless() and
> postfix conditionals ($foo = $bar unless $bar = 0; and the like).  If
> there is violent disagreement with cleaning up these "moderately
> severe" violations, we can always just turn that particular rule off
> in our profile.
>
> Perl::Critic also provides nice summary statistics. Here's an example
> of the summary for Evergreen trunk, with only the most severe
> violations flagged:
>
> 190 files.
> 1,558 subroutines/methods.
> 43,568 statements.
> 55,421 lines of code.
>
> Average McCabe score of subroutines was 4.37.
>
> 537 violations.
> Violations per line of code was 0.010.
>
> 537 severity 5 violations.
>
> 21 violations of BuiltinFunctions::ProhibitStringyEval.
> 48 violations of InputOutput::ProhibitBarewordFileHandles.
> 47 violations of InputOutput::ProhibitTwoArgOpen.
> 12 violations of Modules::RequireBarewordIncludes.
> 6 violations of Modules::RequireFilenameMatchesPackage.
> 314 violations of Subroutines::ProhibitExplicitReturnUndef.
> 1 violations of Subroutines::ProhibitSubroutinePrototypes.
> 7 violations of TestingAndDebugging::ProhibitNoStrict.
> 47 violations of TestingAndDebugging::RequireUseStrict.
> 1 violations of ValuesAndExpressions::ProhibitLeadingZeros.
> 30 violations of Variables::ProhibitConditionalDeclarations.
> 3 violations of Variables::RequireLexicalLoopIterators.
>
> (I know you're not a Perl hacker, but for those Perl hackers reading
> this who haven't been exposed to Perl::Critic, more info on the
> particular violations and the rationale for why they are violations
> can be found for each violation at
> http://search.cpan.org/dist/Perl-Critic/ and in even greater depth in
> Damian Conway's "Perl Best Practices").
>
> I have attached the RATS output in HTML format for the Evergreen tree
> as well, for anyone interested in what it has to say.

I just ran PMD (a code copy & paste detector from
http://pmd.sourceforge.net/) against OpenSRF trunk and it found some
large chunks of duplicated code that we could probably refactor into a
single shared block of code - rationale being that duplicated code is
eventually likely to have a bug fixed in one place but missed in the
other. The output is attached.

I also ran splint against several files of C source and (yay) it
didn't have too much to complain about.

Maybe a short term method of making these checks available to EG
developers would be to add them to the top-level Makefile with a
target like "code-reports".

-- 
Dan Scott
Laurentian University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osrf_cpd.out
Type: application/octet-stream
Size: 32301 bytes
Desc: not available
Url : http://list.georgialibraries.org/pipermail/open-ils-dev/attachments/20071123/edc31ae1/osrf_cpd-0001.obj