Pages

Saturday, May 7, 2011

FlexeLint: A Modern Static Analyzer for C and C++

Static analysis is a powerful technique for quickly finding programming defects in the earliest stages of development and is especially useful for statically typed languages like C and C++. In this post I will discuss FlexeLint, a mature static analysis tool for C and C++ from Gimpel Software. Plenty of examples are provided to demonstrate many of the features and types of issues the tool can detect. Although it is impossible to cover everything about a tool as featureful as FlexeLint in a single post, I will attempt to provide firm understanding of what the tool is capable of and how it can improve the quality of your code. If you are a serious C or C++ programmer, you will greatly benefit by using static analysis and FlexeLint is one of the best, and least expensive, tools out there today.


Overview

Lint was originally distributed with UNIX V7 from Bell Labs, derived from the PCC compiler that was distributed with the operating system.  The purpose of Lint was to catch potential programming errors that were not caught by the compiler: unused variables, uninitialized variables, unreachable code, ignored return values, strong type checking, and several specific common programming errors.

There are still versions of Lint available for most major UNIX platforms including HP-UX, BSD, and Solaris and closely resemble the original Lint. Due in part to their relatively rudimentary nature, clunky use, and reputation for reporting pages of often meaningless warnings for the simplest of programs, they are not widely used.

Splint, originally LCLint, is an Open Source Lint implementation that adds a number of additional checks and improvements but relies heavily on programmer-provided information in the form of source code annotations.  Although more advanced than Lint, the effort required to annotate the source code to the level at which the tool can provide the most useful information is often not realized.  Splint has not been updated in several years, is not actively maintained, does not support C99, and has a number of other shortcomings that limit its usefulness in practice.

PC-Lint is a commercial Lint-based offering provided by Gimpel Software that has been continually enhanced over the last 25 years.  It supports K&R C, C89, and C99; C++ support is also included in the same product.  PC-Lint is distributed as a binary program for MS Windows and as obfuscated ANSI C source code which can be compiled on any platform.  The obfuscated source code version, which is functionally identical to the compiled Windows version, is called FlexeLint.  I will refer to the product as FlexeLint going forward since the name doesn't imply the Windows-only nature that PC-lint does.

Like the original Lint, FlexeLint focuses on finding potential defects through static analysis but it is a lot more intelligent about the process, adds numerous features, addresses a number of limitations with the original Lint, and is extremely configurable.

Some of the features provided by FlexeLint include:
  • Value Tracking of objects, including across functions, to find later issues
  • A powerful and flexible message suppression mechanism
  • Complete control over the format of diagnostic messages
  • Support for strong type checking and dimensional analysis using typedefs
  • Built-in and user-defined semantics checking of function arguments and return values
  • Stack Usage Reporting
  • Support for Multi-Threaded code
  • Extremely customizable, support for a large number of compilers.
  • Support for MISRA C1, C2, and MISRA C++


Starting Example

Let's look at an example of FlexeLint in action.  FlexeLint is a command line tool, files to be "linted" are provided as arguments along with options and files containing configuration information (called Indirect files and having a .lnt extension).

The below example contains a simple, yet defective, C function whose intent is to create a copy of a string:
#include <stdlib.h>
#include <string.h>

char * dup_string(char *s) {
    char *copy = malloc(strlen(s+1));
    strcpy(copy, s);
    return copy;
}
Although this is a trivial example, it is indicative of real mistakes made by C programmers.  It also compiles without warnings with gcc and clang.  You may want to take a minute to try to find the issues yourself before proceeding and compare with FlexeLint's analysis provided below.

Invoking FlexeLint on this code is straight-forward:

lint -u dup_string.c

The -u option tells Lint that we are doing a "unit checkout" and that it should not warn about things like the fact that dup_string is not called since it is part of a larger project that is not being linted as a whole.

The output produced by the above invocation is:
FlexeLint for C/C++ (Linux) Vers. 9.00f, Copyright Gimpel Software 1985-2010

--- Module:   dup_string.c (C)
dup_string.c: 5: Warning 432: Suspicious argument to malloc
dup_string.c: 5: Warning 613: Possible use of null pointer 's' in left argument
    to operator 'ptr+int'
dup_string.c: 6: Warning 668: Possibly passing a null pointer to function
    'strcpy(char *, const char *)', arg. no. 1 [Reference: file dup_string.c:
    line 5]
dup_string.c: 5: Info 831: Reference cited in prior message
dup_string.c: 6: Warning 668: Possibly passing a null pointer to function
    'strcpy(char *, const char *)', arg. no. 2
dup_string.c: 8: Info 818: Pointer parameter 's' (line 4) could be declared as
    pointing to const
dup_string.c: 4: Info 830: Location cited in prior message
dup_string.c: 8: Note 953: Variable 'copy' (line 5) could be declared as const
dup_string.c: 5: Info 830: Location cited in prior message
Note: Gimpel provides an Interactive Demo on their website that you can use to enter in your own sample programs.  If you do not have FlexeLint, and are interested in following along, you can enter the examples here into the Demo.  To get the same output as that presented above, include the following line at the top of the program:
//lint -hSrb1 -"format=%f: %l: %t %n: %m" +fpn -u +e953
This is a "lint comment" which introduces lint options into source code by embedding them in a C comment, note the required lack of space between the beginning of the comment and "lint".  The first two options configure the message format to match what is used in this post.  The +fpn flag option makes FlexeLint assume that all pointer function parameters can be NULL.  The -u option is the unit checkout option explained above.  +e953 enables message 953 which is disabled by default.


Each line contains the file and line number of the issue, the type and number of the message, and the details of the issue found.  Using a unique message id for each message makes it easy to quickly lookup the details of a particular message in the documentation as well as integrate with other tools that provide online help.

On line 5, Warning 432 is given because the expression malloc(strlen(s+1)) is a common error when malloc(strlen(s)+1) is intended, as in this case.

On line 5, Warning 613 is given because s was not checked for being null before being passed to strlen. Passing a null pointer to strlen invokes undefined behavior.

On line 6, Warning 668 is given because the return value of malloc, stored in copy, is being passed to strcpy and could be a null pointer. A similar warning is given for the second argument, s , because it still has not been checked for being null. Info 831 is given as a reference message after the first Warning on line 6 indicating that the statement that caused the possibly null assignment to the copy variable that is being complained about occurred on line 5. With a larger file such references are very helpful.

At the end of the function, when FlexeLint can determine that the data that s points to is never changed, Info 818 if given to let us know that the parameter could be declared to indicate this fact using const.  Again, a reference is provided via Info 830 telling us the location of the parameter s.

Lastly, also at the end of the function, we are alerted to the fact that since the value of the copy variable was never changed, that it could have been declared const as well. We are also provided with the location where copy was declared.

Addressing all of these defects and recommendations yields the following which
produces no output from FlexeLint:
#include <stdlib.h>
#include <string.h>

char * dup_string(const char *s) {
    if (!s) {
        return NULL;
    }

    char * const copy = malloc(strlen(s)+1);
    if (!copy) {
        return NULL;
    }

    strcpy(copy, s);
    return copy;
}
The parameter and return value of malloc are now checked for NULL, the
strlen/malloc defect has been fixed, and const has been appropriately applied.

The output of FlexeLint is highly customizable and the output shown above was generated using a customized output format for the sake of terseness.  By default, FlexeLint displays a 3-line block for each message that shows the line in question, a visual indication of where in the line the issue occurred as well as a slightly different format for the message itself:
FlexeLint for C/C++ (Linux) Vers. 9.00f, Copyright Gimpel Software 1985-2010

--- Module:   dup_string.c (C)
                                    _
    char *copy = malloc(strlen(s+1));
dup_string.c  5  Warning 432: Suspicious argument to malloc
...

Value Tracking

One of the things that sets FlexeLint's static analysis engine apart from traditional Lint is its value tracking and multiple pass features.  Value Tracking allows FlexeLint to store information about variables within and across functions to provide more sophisticated analysis of things such as out-of-bound access, null pointer dereferencing, and division by zero.

Lets look at a simple example:
int main(void) {
    int ia[10];
    int i = 10;

    ia[i] = 0;
    return 0;
}
The error in this case is obvious although it doesn't solicit a warning from the compiler. Because FlexeLint is tracking information for i, it is able to tell that an out-of-bounds access has occurred:
FlexeLint for C/C++ (Linux) Vers. 9.00f, Copyright Gimpel Software 1985-2010

--- Module:   valtrack1.c (C)
valtrack1.c: 5: Warning 415: Likely access of out-of-bounds pointer (1 beyond
    end of data) by operator '[' [Reference: file valtrack1.c: lines 3, 5]
valtrack1.c: 7: Warning 550: Symbol 'ia' (line 2) not accessed
FlexeLint can also glean information from a conditional such as the one in the for loop below:
int main(void) {
    int ia[10];
    int i = 0;

    for (i = 0; i < 20; i++) {
        ia[i] = 0;
    }
    return 0;
}
--- Module:   valtrack2.c (C)
valtrack2.c: 6: Warning 662: Possible creation of out-of-bounds pointer (10
    beyond end of data) by operator '[' [Reference: file valtrack2.c: lines 5, 6]
valtrack2.c: 6: Warning 661: Possible access of out-of-bounds pointer (10 beyond
    end of data) by operator '[' [Reference: file valtrack2.c: lines 5, 6]
valtrack2.c: 9: Warning 550: Symbol 'ia' (line 2) not accessed
Values are also tracked across function calls:
int f(int i) {
    return i * 2;
}

int main(void) {
    int ia[10];
    int i = 0;

    for (i = 0; i < 10; i++) {
        ia[f(i)] = 0;
    }
    return 0;
}
Without multiple passes through the code, FlexeLint won't find an issue here because f has already been scanned by the time it is called in main. FlexeLint logs encountered function calls to be "walked" during later analysis but if the function has already been processed it cannot be walked on the same pass.
Using the -passes(2) option to specify 2 passes through the code yields:
--- Module:   valtrack3.c (C)
valtrack3.c: 13: Warning 550: Symbol 'ia' (line 6) not accessed

/// Start of Pass 2 ///

--- Module:   valtrack3.c (C)
valtrack3.c: 10: Warning 662: Possible creation of out-of-bounds pointer (9
    beyond end of data) by operator '[' [Reference: file valtrack3.c: lines 2,
    9, 10]
valtrack3.c: 10: Warning 661: Possible access of out-of-bounds pointer (9 beyond
    end of data) by operator '[' [Reference: file valtrack3.c: lines 2, 9, 10]
Note that FlexeLint would find this issue even if the definition of f was in another file (as long as both files were being linted together). Static variables are also tracked:
int ia[10];

void h(int len) {
    for(int i = 0; i < len; i++) {
        ia[i] = 0;
    }
}

int main(void) {
    h(20);

    return 0;
}
--- Module:   valtrack4.c (C)

/// Start of Pass 2 ///

--- Module:   valtrack4.c (C)

During Specific Walk:
  File valtrack4.c line 10: h(20) #1
valtrack4.c: 5: Warning 662: Possible creation of out-of-bounds pointer (10
    beyond end of data) by operator '[' [Reference: file valtrack4.c: lines 4,
    5, 10]

The last example is more complex:
#include <stdlib.h>

int *ia;

void g(int i) {
    ia = malloc(i * sizeof(*ia));
    if (!ia) {
        exit(EXIT_FAILURE);
    }
}

void h(int len) {
    for(int i = 0; i < len; i++) {
        ia[i] = 0;
    }
}

int main(void) {
    g(10);
    h(20);

    return 0;
}
This example requires 4 passes and a static depth of 2. By default, only static variables modified within the scope of the current function are tracked. The information gathered by tracking static variables from the call to g is thus not carried over in the call to h. The -static_depth option can be used to increase the depth of the visibility of the static tracking.
/// Start of Pass 4 ///

--- Module:   valtrack5.c (C)

During Specific Walk:
  File valtrack5.c line 20: h(20) ia=[10]? #3
valtrack5.c: 14: Warning 662: Possible creation of out-of-bounds pointer (10
    beyond end of data) by operator '[' [Reference: file valtrack5.c: lines 6,
    13, 14, 19, 20]
To get the most out of the value tracking mechanism, both -passes and -static_depth should be increased from their default value of 1 although doing so may result in longer analysis times.


Limitations of Value Tracking

The Value Tracking mechanism isn't perfect and certain constructs can lead to false positives:
int ia[10];

int f(int i) {
    if (i % 2)
        return i / 2;
    else
        return i + 1;
}

int main(void) {
    int i;
    for (i = 0; i < 10; i++)
        ia[f(i)] = 0;
    return 0;
}
This will cause FlexeLint to complain about a possible out-of-bound access even though the code doesn't contain one. If we use the -vw option to show specific walks we see:
--- Module:   valtrack6.c (C)

/// Start of Pass 2 ///

--- Module:   valtrack6.c (C)
    --- Starting Specific Walk: f(9? | 0?) #1
    --- f #1 returns: 10?
valtrack6.c: 13: Warning 661: Possible access of out-of-bounds pointer (1 beyond
    end of data) by operator '[' [Reference: file valtrack6.c: lines 7, 12, 13]
In this case FlexeLint isn't quite smart enough to figure out that when the parameter to f is 9 the return value cannot be 10 so it reports a possible out-of-bounds access. This is a fairly contrived case and such instances often reflect poor design so having them brought to light isn't necessarily a bad thing.

FlexeLint also won't recognize that i will never be greater than 0 in this example:
int main(void) {
    int ia[10];
    int i = 0;

    for (i = 0; i < 20; i+=50) {
        ia[i] = 0;
    }
    return 0;
}
This is because only the first two clauses of the for loop are examined to establish the assumptions about the values that i can take.  A test inside the loop to ensure i stayed within the bounds of the array would prevent the message.

The relationship between correlated variables isn't recognized and the below example will cause a possible out-of-bounds warning:
int ia[10];

void f(int index) {
    int update = 1;
    if (index > 9)
        update = 0;
    if (update)  /* Only update ia if index in range */
        ia[index] = 0;
}
Although the simpler version will not:
int ia[10];

void f(int index) {
    if (index <= 9)
        ia[index] = 0;
}
This is a somewhat contrived case but it is not uncommon to see code that uses a one variable to determine if another variable has been initialized. Code that checks the dependent variable before working on the other variable will not be sufficient to suppress a warning.

Because FlexeLint currently does not track the values of individual members of an aggregate (structures, unions, and arrays), no out-of-bounds issue will be detected for the following:
struct a {
    int b;
};

int main(void) {
    int ia[10];
    struct a a;
    a.b = 10;
    ia[a.b] = 0;
}

Lastly, since the Value Tracking mechanism tracks specific values seen in the source code, it cannot find a value tracking-related error that doesn't contain a specific value to track somewhere along the line to the problematic code. For example, the following potential buffer overrun in initialize:

#include <stdlib.h>

void initialize(char *p, size_t size) {
    size_t i;
    for (i = 0; i <= size; i++)
        p[i] = '\0';
} // lint !e429

void * my_alloc(size_t size) {
    char * p;
    p = malloc(size);
    if(!p)
        return NULL;
    initialize(p, size);
    return p;
}
would not be detected by the Value Tracking mechanism unless there was a call to my_alloc somewhere that contains a specific value (either directly or indirectly) such as:
void * p = my_alloc(10);


Suppressing messages

Another area where FlexeLint excels is in its rich set of message suppression options that provide the ability to suppress individual messages for specific symbols, within specific functions or function calls, or by parameterized information in a message.  Suppressions can also be applied to macros, expressions, blocks, or individual source lines.  The suppression options can be embedded within code comments or specified as options on the command line or in an Indirect file. For example, the following code:
int func1(int i) {
    return i+1;
}

int func2(int j) {
    return func1(j++ +j);
}
Results in:
test.c: 6: Warning 564: variable 'j' depends on order of evaluation
Any of the following options will suppress this message:
-esym(564,j)       /* Suppress message based on symbol 'j' from the message */
-ecall(564,func1)  /* Suppress message occurring inside call to func1 */
-efunc(564,func2)  /* Suppress message inside func2 */
These options can be specified from the command line, from an Indirect file, or inside a comment in the source code itself.  The option -e564 will suppress this message for the entire project.  These options each have an enabling version that can be used to selectively enable specific messages. For example, if message 564 was disabled, any of the following would work to enable the message in this instance:
+esym(564,j)       /* Enable message based on symbol from message */
+ecall(564,func1)  /* Enable message occurring inside call to func1 */
+efunc(564,func2)  /* Enable message inside func2 */
Additionally, the following code-embedded options will each serve to suppress the message:
int func1(int i) {
    return i+1;
}

int func2(int j) {
    return func1(j++ +j); /*lint !e564 suppress for this line only */
}

int func1(int i) {
    return i+1;
}

int func2(int j) {
    /*lint -e(564) suppress 564 for next expression*/
    return func1(j++ +j);
}

int func1(int i) {
    return i+1;
}

/*lint -e{564} suppress 564 for next statement/block*/
int func2(int j) {
    return func1(j++ +j);
}

int func1(int i) {
    return i+1;
}

int func2(int j) {
    /*lint --e{564} suppress 564 until end of block*/
    return func1(j++ +j);
}

int func1(int i) {
    return i+1;
}

int func2(int j) {
    /*lint -save -e564 suppress message 564 from here ...*/
    return func1(j++ +j);
    /*lint -restore until here...*/
}
Wildcard characters can be used to match error numbers and symbol names as well.


Other Examples

FlexeLint can detect a wide range of potential issues, many more than can be covered here.  Below are a few more short examples of the types of issues that can be caught.
#define MAX(x, y) (x > y ? x : y)
...
int max = MAX(a, b++);
Warning 666: Expression with side effects passed to repeated parameter 2 in macro 'MAX'
#define ADD(x, y) (x + y)
...
ADD(1 * 2, 3);
Warning 665: Unparenthesized parameter 1 in macro 'ADD' is passed an expression
#define ADD(x, y) (x) + (y)
Info 773: Expression-like macro 'ADD' not parenthesized
if (flags & FLAG == 1)
Warning 514: Unusual use of a Boolean expression
int *p;
p = malloc(sizeof *p);
...
p = malloc(sizeof *p);
Warning 423: Creation of memory leak in assignment to 'p'
long *p;
p = malloc(4);
Warning 433: Allocated area not large enough for pointer
char c;
    while ((c = getc(fp)) != EOF) {
        ...
    }
Info 734: Loss of precision (assignment) (31 bits to 7 bits)
Warning 583: Comparing type 'char' with EOF
int *p;
if (cond)
    if (cond2)
        a = 1;
else
    b = 1;
Warning 525: Negative indentation from line 2
if (cond)
    do_something();
    do_something_else();
Warning 539: Did not expect positive indentation from line 1

Overview of other various features

FlexeLint supports a number of other useful features that cannot be covered in detail in this post (perhaps some of them can be explored in later posts is there is enough interest) but below is a summary of some of the more notable ones.

  • Automatic Prototype generation for K&R style functions
  • FlexeLint can generate prototypes for K&R C functions which is helpful when porting code to C89/C99.

  • Finding Unused Headers
  • FlexeLint can report unused headers (and it pretty intelligent about the process). To check code just for unused headers, use:   lint -w1 +e766 file(s)

  • Compatibility with Legacy Lint
  • Legacy versions of Lint used several directives embedded in comments to suppress particular messages.  For example, /* NOSTRICT */ would disable strict type checking in the next expression and /* FALLTHROUGH */ to indicate that fall-through inside a switch case is intended.  FlexeLint supports these directives which can be useful when linting code that uses them.

  • Support for numerous compilers
  • FlexeLint contains a large number of "flag" options to customize various behaviors such as whether or not anonymous unions are supported, whether binary literals are supported, whether a for loop creates a new scope, and many more.  These options, along with explicit options for about 20 compiler families, can be used to tailor the behavior of FlexeLint to just about any compiler and environment.  Gimpel supplies compiler configuration files for about 100 different compilers including Borland, GCC, IAR, IBM, Intel, Keil, CodeWarrior, MS, Sun, Watcom, and Whitesmith.

  • Function Deprecation
  • Functions can be marked as 'deprecated' which will solicit a warning when used in code.  An optional string can be provided which will be appended to the message as well. For example: -deprecate(function, sprintf, Use 'snprintf' instead) will cause the following warning to be generated when the sprintf function is encountered:
    Warning 586: function 'sprintf' is deprecated. Use 'snprintf' instead
    Variables, Macros, and keywords can also be deprecated using this options: -deprecate(macro, and_eq, Yuck) -deprecate(keyword, auto)

  • Message Annotations
  • The -append option can be used to append custom messages to a specific diagnostics: -append(534(fclose), - Return value of fclose must be checked per Coding Standards)

  • Preprocessor
  • The -p option will cause FlexeLint to act as a preprocessor.

  • Output in HTML or XML
  • FlexeLint has the ability to output HTML formatted messages (this is what is used to produce the results on the Demo page) and XML formatted messages:
    <?xml version="1.0" ?>
    <doc>
    <message><file>dup_string.c</file> <line>5</line> <type>Warning</type> <code>432</code> <desc>Suspicious argument to malloc</desc></message>
    <message><file>dup_string.c</file> <line>5</line> <type>Warning</type> <code>613</code> <desc>Possible use of null pointer 's' in left argument to operator 'ptr+int'</desc></message>
    <message><file>dup_string.c</file> <line>6</line> <type>Warning</type> <code>668</code> <desc>Possibly passing a null pointer to function 'strcpy(char *, const char *)', arg. no. 1 [Reference: file dup_string.c: line 5]</desc></message>
    <message><file>dup_string.c</file> <line>5</line> <type>Info</type> <code>831</code> <desc>Reference cited in prior message</desc></message>
    <message><file>dup_string.c</file> <line>6</line> <type>Warning</type> <code>668</code> <desc>Possibly passing a null pointer to function 'strcpy(char *, const char *)', arg. no. 2</desc></message>
    <message><file>dup_string.c</file> <line>8</line> <type>Info</type> <code>818</code> <desc>Pointer parameter 's' (line 4) could be declared as pointing to const</desc></message>
    <message><file>dup_string.c</file> <line>4</line> <type>Info</type> <code>830</code> <desc>Location cited in prior message</desc></message>
    <message><file>dup_string.c</file> <line>8</line> <type>Note</type> <code>953</code> <desc>Variable 'copy' (line 5) could be declared as const</desc></message>
    <message><file>dup_string.c</file> <line>5</line> <type>Info</type> <code>830</code> <desc>Location cited in prior message</desc></message>
    </doc>
    

  • Echo Source
  • The +source outputs a copy of the source code with diagnostic messages inserted into the code.

  • Semantics
  • Many of the C Standard functions are associated with built-in semantics that help FlexeLint catch specific types of errors.  For example, abort and exit do not return so code following a call to those functions is unreachable, the 2nd argument to fgets represents the size of the buffer provided by the first and if the buffer is not large enough an error is reported, free causes a pointer to be considered uninitalized, the return value of strcpy is guaranteed not to be null but the arguments provided to the function must not be, etc.  These semantics can be copied to user-defined functions and new semantics can be created.

  • Strong Type Checking
  • Fully customizable strong type checking using typedefs. Strong checking can be enabled on all or only specified typedefs. Typedef hierarchies can be created and printed. Dimensional Analysis allows establishing relationships between typedefs that involve division or multiplication of types to create new types.


Related Resources
Lint, a C Program Checker
Splint Project Page
Gimpel Software's Website
FlexeLint Discussion Forum
FlexeLint Interactive Demo
FlexeLint Message Listing

4 comments:

  1. Good summary. I use Gimpel Lint too.

    ReplyDelete
  2. great article! Thanks

    ReplyDelete
  3. Hi,

    Are there any freeware tool that compares to flexelint?

    regards,
    Iniyan

    ReplyDelete
  4. what are limitations of online flexlint? can't I use it to test my project? looking forward to your reply

    ReplyDelete