Static code analysis refers to the analysis of a software without executing it. This is done by automatically examining the complete source code with regard to a set of pre-defined rules and then notifying the programmers about the rule violations found.
Most static code analysis tools can be integrated as Plugins into the development environment and marked there the rule offences directly in the source code. This is a very powerful feature, since the developer receives immediate feedback about possible vulnerabilities or bad practices already during programming. If these are found later in the development cycle, the fix is much more complex and thus more expensive.
Likewise, static code analysis tools can be integrated into automated build processes, generate reports and alerts and - depending on the configuration - cause the build to fail.
Static code analysis is therefore a very effective automated code review process. This does not mean that it can replace manual reviews. First, static code analysis cannot detect 100% of all errors (partly because it does not know the functional requirements of the software) and second, it is important that knowledge of the code is shared within the team.
For static code analysis, there are numerous very good open source tools available that address all the problems mentioned above. I will present them in detail in the third part of this series.
Once you have set up the appropriate tools, they can quickly check the source code and give the developer numerous recommendations for improvement - regarding coding style, potential bugs, bad practices, poor maintainability and potential security holes.
Advantages of Static Code Analysis
Manual code reviews are tedious. The great strength of static code analysis is that it quickly and automatically checks the entire code base without the need to execute the code. This significantly reduces the effort required to detect problems in the code.
Manual code reviews clamp developers. Automation relieves developers and allows them to focus more on the further development of the software.
Static code analysis can be integrated into the continuous delivery process and thus be executed regularly and fully automatically. If a tool is extended to detect new problems, these can be immediately detected throughout the entire code base.
Finding problems earlier
The earlier an error is found, the lower the costs for its removal. By integrating the static code analysis tools into the IDE, problems can be detected and fixed very early, during the programming phase. Developers are pointed out directly in the code - along with explanations and suggestions for improvement.
Automation checks the complete code - including code passages that the developers rarely see. In addition, static code analysis tools can analyze and check all execution paths of an application - including those not covered by testing. Human error can be made when configuring the tools, but not when executing them.
Better quality at lower cost
Ultimately, all of the above benefits result in better code and product quality at lower cost for the entire development project. By continuously delivering secure, reliable and maintainable software, the reputation of the developers and the company they work for is enhanced.
Disadvantages of static code analysis
Static code analysis tools must be evaluated, learned, installed and configured. However, these costs will pay for themselves after only a few weeks. This series of articles should help to minimize the lead time and costs.
Needs a rollout strategy
Applying static code analysis tools to existing code can reveal thousands of problems. This usually leads developers to simply ignore these messages. Therefore a rollout strategy is necessary. The best way is to prioritize the problem types and initially show only the highest prioritized ones. And only when these have been completely resolved will the next most important category be displayed and processed.
Not all errors are detected automatically (false negatives). Styleguide violations or certain error patterns can be reliably detected. Security problems on the other hand - e.g. in the authentication process, newly discovered security holes in external libraries, new attack patterns or incorrect configuration outside the source code - are difficult to find. Errors in the implementation of concurrent code, which can lead to race conditions, are also difficult to detect by static code analysis.
Occasionally it happens that correct code is marked as incorrect (false positives). This happens when a tool is "insecure", for example when the integrity of input data cannot be verified or the application interacts with closed-source components.
How static code analysis can help
The tasks for static code analysis tools can be divided into the following categories:
- Verification of the code standard
- Calculation of software metrics
- Detection of errors in the code
- Detection of vulnerabilities
An additional cross-sectional aspect is progress monitoring, i.e. the storage of metrics of different tools over time. This makes it possible to track whether the code quality improves or deteriorates with regard to the configured aspects.
Verification of the code standard
Tools in this category check whether the source code meets pre-defined and configured code formatting requirements. Aspects such as bracketing, indentation, line width, spaces and lines, class and method lengths, number of method parameters and much more can be checked.
Uniform code standard is important
A uniform code standard has the following advantages:
- Uniformly formatted code is easier to read and understand.
- If all developers write directly in a uniform style, reviews are easier to perform because the code does not have to be adapted to the common rules.
- Code that is difficult to read and therefore difficult to understand can lead to errors and security risks.
Modern IDEs can format code on their own. For this purpose, formatting rules integrated in the IDE can be used, adapted, exported and imported. However, the code styles built into Eclipse and IntelliJ by default are different. Eclipse has the styles "Java Conventions", "Eclipse" and "Eclipse 2.1"; IntelliJ offers the "default" style.
IntelliJ can import styles exported from Eclipse, but the other way around is not possible. Because IntelliJ does not generate 100% of the same code as Eclipse with an imported Eclipse format, there is also the Eclipse Code Formatter plugin, which formats code in IntelliJ in exactly the same way as Eclipse.
The Java Conventions from Eclipse correspond to the Java Code Conventions published by Sun in 1997 and last revised in 1999. Accordingly, this style does not know newer language elements such as generics or lambda.
A modern and widespread IDE-independent code style is the Google Java Style Guide, which is used in many projects either directly or slightly modified. The Google style has the following advantages:
- Configuration files for Eclipse, IntelliJ and Checkstyle (more about this below) are offered.
Ensure uniform code standards
How can static code analysis ensure uniform code standards?
Modern IDEs can format the code, but there are two limitations:
- they cannot ensure all coding standards, e.g. not the following:
- maximum method and class length,
- maximum number of "Non Commenting Source Statements" in a method,
- maximum number of parameters of a method,
- maximum nesting depth of loops and control statements.
- you cannot ensure that all developers have enabled the formatters and configured them correctly.
This is where static code analysis tools of the category "Code Default Check" come in, which fill exactly this gap. Once the appropriate tools are configured and integrated into the build pipeline, it is ensured that the same code style is used throughout the project and by all team members.
Tools for checking the code standard
The best known Open Source representative is Checkstyle. Checkstyle can be configured quite flexible and can be adapted to all the above mentioned code styles. For the Google Java Style a Checkstyle configuration file is available for download. Checkstyle can also be integrated into the build process, so that it will give warnings or fail if the rules are violated. I will go into more detail about Checkstyle in the third part of this article.
Calculation of software metrics
Software metrics are functions that represent certain quality characteristics of a software (such as maintainability, extensibility or comprehensibility) in an objective and comparable numerical value (e.g. "maintainability index", "cyclomatic complexity"). Software metrics can help developers and teams achieve quality goals. To this end, tools are integrated into the build process to calculate software metrics and alert developers in case of deviations from previously defined target values. Some of the best known metrics are the following:
- Coupling: the degree of dependencies between the modules of a system - a low coupling leads to better comprehensibility and maintainability;
- Cohesion: the degree of dependencies within a software module - high cohesion leads to better comprehensibility and maintainability;
- Average Component Dependency: the average number of dependencies of components in a software system;
- Circular dependencies: these are equivalent to a high degree of coupling of the modules involved - none can be reused on its own - and none can be understood without the others;
- Cyclomatic complexity: the number of different paths through a software module - the higher the cyclomatic complexity, the more difficult the software module is to understand;
- Maintainability Index: a value that results from the combination of certain other metrics.
- Line coverage: this value indicates how many lines of code are covered by automatic tests in relation to the total number of lines of code;
- Branch coverage: the ratio of the program flow paths covered by tests to the total possible flow paths
Tools for calculating software metrics
The best-known open source tools in this category are JaCoCo and Cobertura for measuring test coverage, Sonargraph Explorer for calculating and displaying (cyclic) dependencies, and SonarQube for calculating metrics on complexity, maintainability, reliability, security and test coverage.
Detection of errors in the code
Tools in this category try to find potential errors in the code by detecting common error patterns. Examples are:
- secure/potential NullPointerExceptions,
- Comparison with equals() on objects of different classes,
- Classes, which define equals(), but not hashCode() - or vice versa,
- String comparisons with == or != (such a thing is either a bug or - if intended - confusing for the next developer and therefore error-prone),
- switch statements without default option,
- switch statements with "fall throughs" (these are confusing, as the reader often does not know if they are intended),
- unused constructor or method parameters,
- unused local variables,
- unused private fields and methods,
- Resources that are not (in any case) closed,
- Objects that make references to variable internal objects, such as lists, visible to the outside (instead of copies or read-only proxies),
- missing assert statements in unit tests.
Tools for error detection
Static code analysis tools for error detection are e.g. the Java compiler itself (with appropriate parameters), the open source tools PMD, FindBugs and its successor SpotBugs, and SonarLint.
Detection of vulnerabilities
Ultimately, vulnerabilities in the software are caused by errors in the code. In contrast to the previous category, however, these are errors that are much more difficult to detect, as they require extensive data flow analyses: It must be checked how input data flows and is processed by the system and, if necessary, also by external libraries, both via regular and extraordinary execution paths. Examples of security vulnerabilities are:
- Command and SQL Injection: missing/inadequate verification of input values can cause e.g. user input in forms to be interpreted by the software as (SQL) commands (e.g. "; delete * from User;" - if this is sent unchanged to the database as part of a query, the complete user table might be emptied);
- Errors in the access control: so that unauthorized users can perform functions for which they have no authorization (e.g. reading private data of other users)
- Cross-site scripting: the introduction of malicious executable code e.g. via request parameters in a URL;
- Cross-Site-Request-Forgery: a logged-in user of a system is sent a link that unknowingly executes a malicious action (e.g. he could change his password to a password specified by the attacker, which would allow him to log in to the user account)
Important terms in connection with software security are:
- CWE - Common Weakness Enumeration: a list of common vulnerabilities compiled by the developer community, in which each vulnerability is assigned a unique identifier, such as CWE-77 for the above-mentioned "Command Injection" or CWE-89 for "SQL Injection". This list is not prioritized. It serves as a common language, so to speak: almost all software security tools provide the corresponding CWE identifier for detected vulnerabilities.
- OWASP - Open Web Application Security Project: a non-profit organization with the goal of increasing software security. The best-known project is the "OWASP Top 10" list, which currently lists the ten most critical web application security risks (including their CWE classification). The list was last updated in 2017.
- CWE/SANS Top 25 Most Dangerous Software Errors: an alternative list of the 25 most critical vulnerabilities - but not updated since 2011.
- Exercise 1:
- Download latest checkstyle-8.36.1-all.jar
- Download latest configuration files google_checks.xml
- Check the style of your code with this command:
- java -jar checkstyle-8.36.1-all.jar -c google_checks.xml MyClass.java