Criteria for Evaluating Static Analysis Tools

Here are the things on my list, that I use for my clients (including some of those that you've mentioned):

  • Coverage (according to what the org requires today, and expects to use in the future)
    • Language
    • Architecture (e.g. some tools are great for web apps, but not so much for rich clients or even Windows Services / daemons)
    • Framework (e.g. support for ASP.NET but not MVC, or Java but not Spring)
    • Standard Libraries (e.g. recognizing Hibernate, log4j, or AntiXSS, to name a few problematic ones) as are needed
  • Scan performance. In this I include both:
    • Speed
    • Scalability to large codebases (including LoC, subsystems/projects, and external references)
  • Completeness - i.e. rules/scripts included in the scan, enough to provide confidence in the output, or in other words to minimize false negatives.
    • Note that this is both the list of provided rules (e.g. checks for "SQL Injection");
    • AND the quality of those checks, i.e. actually catch SQL Injections. (As an example, a certain leader in the field, which shall remain nameless, can be easily confused by many forms of codeflow, thus missing basic injection flaws.)
  • Accuracy of results. This applies to the results which ARE reported (as opposed to those missing, covered in "Completeness"). Poor accuracy can be seen in:
    • High numbers of false positives
    • Duplicates
    • Miscategorizations
    • Misprioritizations (e.g. labeling a very low impact bug as "High Risk")
    • Irrelevant results (i.e. a code flaw which is accurate, but completely irrelevant to the application or architecture; e.g. finding XSS in a non-HTML, non-HTTP, and non-interactive app, or SQL Injection on a client application).
  • Customizability - as you said, customizing sources, sinks, and filters; also reports; but also, no less so, customizing rules/scripts and adding custom logic (not just source->filter->sink). Note that many tools allow you to "customize" their rules, but this is really limited only to adding source/sink/filters.
  • Sustainability / repeatability - by this I refer to handling of repeat scans. How does it handle changes? Issues previously marked as false positives? Do I get comparisons?
  • Deployment model, e.g. usually combination of:
    • single auditor station
    • shared server, accessed via remote
    • web access
    • developer plugin
    • build server pluggability
    • (and of course ability to set a different policy for each)
  • Usability. This includes:
    • UI (including hotkeys etc)
    • auditor filtering capabilities
    • providing enough context via highlighting enough of the codeflow
    • static text, such as explanations, descriptions, remediation, links to external sources, Ids in OWASP/CWE, etc.
    • additional user features - e.g. can I add in a manual result (not found by automatic scan)?
  • Reporting - both flexible per project as needed (dont forget detailed for devs and high-level for managers), and aggregated cross-project. Also comparisons, etc etc.
  • Security (in the case of a server-model) - protecting the data, permissions management, etc. (much like any server app...)
  • Cost. This includes at least:
    • hardware
    • license - any or all of the following:
      • per seat - auditor
      • per seat - developer
      • per seat - reports user
      • per server
      • per project
      • per lines of code
      • site license
    • maintenance
    • services - e.g. deployment, tuning, custom integration, etc
    • training (both the trainer, and the auditor/developers' time)
  • Integration with :
    • source control
    • bug tracker
    • development environment (IDE)
    • build server / CI / CD
    • automation

Looking over this, I think it's pretty much in the order of preference - starting from basic requirements, to applicability, to quality, to ease of deployment, to efficiency, to nice-to-have...


Here is the most important thing to know about how to evaluate a static analysis tool:

Try it on your own code.

I'll repeat that again. Try it on your own code. You need to run a trial, where you use it to analyze some representative code of yours, and then you analyze its output.

The reason is that static analysis tools vary significantly in effectiveness, and their effectiveness depends upon what kind of code tends to get written in your company. Therefore, the tool that's best for your company may not be the same as what's best for another company down-the-road.

You can't go by a feature list. Just because a tool says it supports Java doesn't mean it will be any good at analyzing Java code -- or any good at analyzing your Java code and finding problems that you care about.

Most static analysis vendors will gladly help you set up a free trial so you can try their tool on your own code -- so take them up on their offer.

Gary McGraw and John Steven have written a good article on how to choose a security static analysis tool. In addition to hitting the point that you need to try the tools on your own code to see which is best, they also point out that you should take into account how well the tool can be customized for your environment and needs, and budget for this cost.


A long list of criteria is as likely to distract you as help you come up with a good solution.

Take, for example, the issue of "false positives". It's an inherent problem with such tools. The long term solution is learning how to live with them. It means that your coders are going to have to learn to code around the static analysis tool, learn what causes it to trigger a false positive, and write code in a different way so that the false positive isn't triggered. Its a technique familiar with those who use lint, or those who try to compile their code warming free: you tweak the code until the false positive stops triggering.

The biggest criteria is understanding the problem you are trying to solve. There is an enormous benefit to making your programmers go through the step of running a static analyzer once, to remove the biggest problems in your code, and frankly learning what they should already know about programming and not make those mistakes. But the marginal value of continuously running static analyzers is much less, and the marginal cost is much higher.