Testing testers: Error detection tools for Win32
A review of the following error detection tools for Win32 software development: BoundsChecker 4.0 Professional, CodeGuard 32/16, HeapAgent 3.0, Purify 4.0/NT, PC-lint 7.0.
An abbreviated version of this article appeared in Dr. Dobb's Journal of February 1997.
- Product overview
- The tests
- Ease of use
- In conclusion
- Summary of features and test results
- Sidebar: Static and dynamic testing
- Sidebar: Instrumentation techniques
In an ideal world, software development would progress smoothly from requirements to completed product. In ours, errors creep in. To get them out, we can use an ever growing assortment of automated checking tools. You probably know them: BoundsChecker, CodeGuard, HeapAgent, Purify, PC-lint. Basically, these tools watch your program in operation and report any errors they detect. (The exception is PC-lint, which operates on your source code.) The trouble is: which checking tool is best for you? Every vendor claims its product is the best... and they all have the numbers to prove it. So what’s a programmer to do? I worked with all tools for a while in my own development, and in addition used a number of tests, some of my own, some of the vendors themselves, to compare the products. The result, not surprisingly: all products have their strengths and weaknesses. None of them is ideal in all respects, but neither are there any bad tools among them -- it just depends on what you’re looking for.
All tools reviewed here fall into one of two categories: static or dynamic testers. Static testers do their job without executing the program under test, whereas dynamic testers monitor its execution. (See the sidebar Static and dynamic testing for more information.) PC-lint is the only static tester in the roundup; all others are dynamic testers. Between them, dynamic testers use several different methods to monitor the program under test, but one way or another, they all insert some sort of probes into your program. These probes are known as instrumentation, and many differences between the tools can be traced back to their instrumentation method. (See the sidebar Instrumentation techniques for more information.) Here are the products:
BoundsChecker, from NuMega Technologies, is possibly the best known product in its class. The BoundsChecker product range started several years ago with products for DOS and 16-bit Windows; the most recent versions (the ones reviewed here) target Win32 platforms. In particular, I tested BoundsChecker 4.0 Professional for Windows NT and its companion for Windows 95.
BoundsChecker can be used as a stand-alone program loader and tester. If you don’t do anything else, this causes BoundsChecker to intercept calls to Windows API functions and heap-related C and C++ runtime library functions while your program is running, and record invalid parameters, invalid pointers, error return codes from API functions, and a general event trace. At program termination, memory and other resource leaks are reported. If a problem is detected, BoundsChecker by default pops up a dialog box that shows the type of error, its location (if source debug information is available), and several options, among which the ability to suppress further reporting of the same error and one to break into a debugger at the error location. All error reports are collected in BoundsChecker’s log window, which can be saved at the end of the session. As a final option, BoundsChecker can perform a Win32 compliance check, which signals the presence or use (your choice) of Win32 API functions that differ among Win32s, Windows 95, and Windows NT.
If Microsoft Visual C++ (4.0 and later) is your C++ compiler, you can take advantage of BoundsChecker’s Integrated Debugging mode. In this mode, you don’t need the BoundsChecker program loader but instead use the Visual C++ workbench’s debugger to run your program. The rest is the same: BoundsChecker sits in the background and pops up if it detects a problem. The resulting error log ends up in the Visual C++ Output window, under a separate BoundsChecker tab.
All this is equivalent to the Standard edition of BoundsChecker. If you have the Professional edition, you can improve error detection by adding compile time instrumentation (CTI). In essence, this is a preprocessing step on your C and C++ source code which adds numerous checks to pointer operations etc. that help detect several additional types of errors. Once the extra instrumentation is in place, operation is identical to the Standard edition. One thing to be aware of: CTI is currently supported only in conjunction with Microsoft Visual C++ (4.0 and later).
BoundsChecker’s operation can be extensively tweaked through options accessible from its program loader (or through the Visual C++ workbench), and by means of configuration, suppression, and library specification files. Furthermore, version 4.0 adds the ability to specify custom validation modules, which let you add virtually any kind of checking or logging that you see fit, using the same routines that BoundsChecker uses internally for the built-in checks.
CodeGuard, from Borland International, is a companion product to Borland’s line of C++ compilers. It made its debut as a 16-bit add-on tool for Borland C++ 4.5 and is now part of the Borland C++ 5.0 Development Suite, with 16- and 32-bit versions covering both Win16 and Win32 programs (DOS programs are not supported by CodeGuard). During the tests, I concentrated on the 32-bit version. The 16-bit version is similar, with the exception of a number of pointer checks that rely on the 32-bit CPU model and are therefore not available in 16-bit mode.
Once installed, CodeGuard becomes part of the IDE. In fact, the C++ compiler knows enough about CodeGuard to insert the CodeGuard instrumentation code into the object code of C and C++ programs during compilation if the right options are given. With the instrumentation code in place, the program under test is then linked to the CodeGuard library, which intercepts both C runtime and Windows API functions. At runtime, the CodeGuard DLL tracks pointer usage with the aid of the instrumented code, and API usage through the intercepted entry points. If an error is detected, CodeGuard by default pops up a message box (no specific information - just that an error was found) and writes a report to a log file. If the program is run from within the Borland IDE or Turbo Debugger, a breakpoint will occur and the debugger becomes active. At program termination, a leak search is performed and added to the log file. If so configured, CodeGuard will also add a function call profile to the log file, containing the number of times each intercepted function is called. The whole process from compilation onwards can also be performed from the command line, in which case CodeGuard does report and log errors, but no breakpoints will occur.
CodeGuard’s operation is configurable through a separate setup program (accessible through the IDE) or by directly editing a configuration file which records the options that apply to a given executable. Through this configuration file, you can determine the amount of memory and pointer validation, the details of API checks, and several other matters.
HeapAgent, from MicroQuill Software Publishing, is a dynamic testing tool that concentrates on detecting heap errors; version 3 also adds stack checking (for Visual C++ programs only). It is developed and marketed by the makers of the popular SmartHeap heap management libraries, but it can be used with standard heap managers as well.
For Microsoft C/C++ programs running under Windows NT, HeapAgent requires no special actions to use. For other compilers or platforms, you must at least relink your application with the HeapAgent libraries, and optionally recompile as well if you want error reports with the file name and line number of the offending source code (instead of the just the address). You also need to recompile and link if you want to use the HeapAgent API functions from within your program, for example to perform heap checks at specific points in your program.
When your HeapAgent-enabled program starts, the HeapAgent DLL is loaded in your program’s address space and proceeds to patch the entry points to heap-related calls. This occurs for the EXE as well as for any DLL modules that were loaded at the time the HeapAgent DLL loads, but you can specify DLLs to exclude from this patching process (for instance, system DLLs). HeapAgent effectively replaces the existing heap allocation routines by its own, which add detailed information about the heap block’s origin and state, and add guard bytes before and after the area that is passed back to your program. A system of filler values (different for unallocated, freed, and allocated but uninitialized blocks) helps to detect over- and underwrites and the use of uninitialized memory. Stack checking is also based on a combination of guard bytes and filler values. Because of its nature, HeapAgent can detect most errors (such as overwrites) only some time after the fact; at that time, little information is present anymore to find the culprit.
At runtime, the HeapAgent user interface is available with browser windows for allocated blocks, error reports, and heap statistics, among others. One thing that I found rather bizarre for a testing tool is the fact that the browser windows are closed immediately after the program under test terminates. The error log itself can be stored in a file if so configured, but it still struck me as odd not to be able to browse through error reports, heap statistics and source code after program termination. On the other hand, a nifty feature of HeapAgent are its "agents", essentially watchdogs that come into action when some condition is met. With the aid of agents, you can trigger checks and breakpoints when a heap-related event occurs; this helps to trace down the origin of problems.
Purify, from Pure Atria Software, has been known for several years as a product for Unix environments. With Purify 4.0 NT it is now available for PC platforms, initially only if they run Windows NT.
Purify operates as a program loader and tester. For instrumentation, it uses a technique called Object Code Insertion (OCI) which takes an executable module (either .EXE or .DLL), analyzes the code within, and inserts additional instructions that check pointers and memory accesses. This happens for each module that is used by a given program, including third party modules such as system DLLs. (Don’t worry - the original module is never modified; Purify works with instrumented copies.) As a result, every piece of code that is executed by your program will be instrumented, regardless of its origin. However, OCI alone does not catch all errors; to detect memory leaks, Purify uses an approach that resembles the "mark" phase in "mark and sweep" garbage collection schemes, by recursively following potential pointers to identify reachable heap blocks. Any blocks that cannot be reached at all are then considered leaks; any blocks that have pointers into them, but not to their start address are reported as potential leaks.
After the instrumentation phase, Purify runs the instrumented program and tracks all detected errors. The error reports are collected in a log view, which is updated while the program is running. Contrary to the other dynamic testers, Purify does not pop up a message box when it detects an error, but it can be configured to cause a debugger breakpoint in those cases.
Purify configuration is done from within the loader. Apart from settings that determine features such as the size of the deferred free queue and whether or not memory and handle leaks checks are performed at program termination, Purify’s output can be tailored to your needs through the use of filter sets. As its name implies, a filter determines which types of error reports are shown and which aren’t. This is purely a display matter: the unwanted error reports are hidden, but remain present in the overall log. A sophisticated filter manager makes it easy to create and combine different filters and share them across programs. Finally, the Purify runtime support can be accessed through a documented API, which allows your program to communicate with Purify while it is being tested, for example to test for new memory leaks created between two locations in your program. Incidentally, this is the only case for which you need to recompile and link your source code in order to work with Purify.
PC-lint, from Gimpel Software, is the only test tool in this review that performs a static analysis of your program to detect potential problems. It is inspired by the well-known Unix lint utility, but over the years vastly improved by Gimpel Software. Version 7.0 can produce well over 500 different diagnostics relating to C and C++ syntax, usage and programming style, and includes fairly sophisticated techniques such as strong typing (yes - really strong, not the C/C++ idea of strong) and inter-statement value tracking.
To operate PC-lint, you invoke it on your source file(s) as if it were a compiler. PC-lint parses the source code, processes include files etc., and complains (to stdout) about what it considers to be illegal, dangerous, or just bad style. Generally, it finds a lot to complain about. Its inspiration is drawn from the C and C++ (draft) standards, but also from expert advice as written down in the books of Cargill, Coplien, Meyers, Murray, Plum and Saks, and their likes. As a result, PC-lint acts very much as a C and C++ programming-in-the-small style oracle. There is overlap with the dynamic testers, though: by virtue of its initialization and value tracking, and also because it recognizes more general problems (for example: absence of copy constructor or assignment operator in classes with pointer data members, or absence of delete operations in destructors of the same), it will frequently spot potential memory leaks or array out-of-bound accesses without actually executing the program.
PC-lint is configurable to the extreme. You can specify options on the command line, in response files, or even embed them as comments in your source code. The options range from enabling and disabling of certain diagnostics (in general, per module, or per identifier), through specification of its operating environment (to allow PC-lint to mimic, say, a Microsoft C++ compiler complete with the right definition of _MSC_VER, integer and pointer sizes, and include directories), to tailoring its output format. The latter is particularly useful because it allows you to add PC-lint as a tool to your favorite IDE or editor, then use its diagnostic output processing to jump to the right source code location in response to PC-lint’s messages. To get you started, PC-lint comes with configuration files for a few dozen C and C++ compilers.
The tests included several known-to-be-buggy programs from the respective vendors (obviously designed to bring out the best in their, and the worst in their competitors’ products), some in-depth test programs of my own, and a number of real programs which I worked on during the test period. Table 1 contains a summary of the results.
Tools can only have effect if you actually use them. So, no matter how sophisticated their tests are, they must be easy to operate or else they become shelfware. In fact, they should become part of the development cycle, since we all know that "testing quality into a product" as a final step in the development process is doomed to fail utterly.
I could come up with carefully wrought analyses of which user interface is the best, which options make most sense, etc. and declare a winner on these grounds. I will not do so. Instead, I just kept a tally of how often I actually used each product, and to which ones I turned if I had a problem. This is not quite as scientific, but it is probably more honest and more indicative of which ones stood the test of practice. You’ll have to bear with my personal preferences (or aberrations) as far as my development environment goes: throughout the test period, I mostly used Borland C++ 5.0 and Microsoft Visual C++ 4.1, with some stints of Symantec C++ 7.21 and Watcom C++ 10.6 in between. Programs under test were Win32 console, OWL, and MFC GUI applications, ranging from small (a few hundred lines) to larger (up to 60000 lines). All programs were written in C++.
The result: BoundsChecker (with RTI, not CTI) and Purify were the ones I used most often, with about equal frequency. The reason is quite simple: neither requires changes to the build process (Purify’s OCI is performed automatically when a program is loaded). Their usage frequencies are about the same because I use either one as a second opinion to the other. I feel more comfortable with BoundsChecker’s memory and resource leak detection and API validation, but Purify inspires more confidence with regard to in-depth pointer checks. On the other hand, Purify can only be used in conjunction with Microsoft-generated code, which made it unsuitable for the other compilers (which BoundsChecker could handle). The same goes for the target platform, but since I primarily use Windows NT, this was less of a problem.
BoundsChecker with CTI came in lower in the usage count, for two reasons: one is that the instrumentation process requires a separate build, and the other is that the resulting executable often ran too slow for my admittedly limited patience. As a result, I used BoundsChecker Pro with CTI primarily if I needed a very thorough test; it has no equal in this respect. (By the way, this is also what NuMega recommends.) CodeGuard is also a special case. First of all, it depends on the Borland C++ compiler, and then it requires a separate build (possibly also of the libraries used by the program at hand.) In the end, I mostly used it for one specific 16-bit OWL application; unfortunately, it sometimes crashed (with) the program.
HeapAgent usage trailed the others, not because it isn’t easy or fast enough to use, but because it covers less errors. With the luxury of several other tools available, there was little that HeapAgent could add to the picture, although this might be different if you also use MicroQuill’s SmartHeap. HeapAgent knows about SmartHeap and adds some extra checks and a heap pool viewer if it finds that it runs with a SmartHeap’d program. One funny thing was that HeapAgent never found the heap memory leaks that were present in one of my test programs (MicroQuill informed me that this was due to the Beta version I used); for the errors that were found, it was often difficult to relate them to the source code that caused them.
What about PC-lint? This is one tool that requires an iron discipline to use. Despite my best intentions, I must confess that I used it far less than I planned to. Apart from my obvious lack of discipline, I attribute this to the fact that I tired of tweaking PC-lint’s options over and over again. PC-lint would benefit greatly if Gimpel Software (or some kind soul in the programming community) would make an interactive option tweaker available. More than anything else, configuring PC-lint such that it reports the important things without flooding you with minor quibbles is a chore that hampers day-to-day use of PC-lint. Especially if you routinely use different C++ compilers with different libraries and frameworks (as I do), you get bogged down in configuration file upon configuration file. Which is a pity, because PC-lint truly deserves to be a fully integrated part of your development environment.
BoundsChecker 4.0 Professional
BoundsChecker will be the testing tool of choice for many situations. It is broad in scope, includes extensive Windows API and OLE checks and, in combination with CTI, catches almost any error that relates to memory usage. However, you should be aware of the fact that to use CTI you need to add a separate build variant to your development process, that a CTI-instrumented executable can be significantly slower than a regular one, and finally that only Microsoft Visual C++ compilers are supported with CTI.
CodeGuard is a good compromise tool for Borland C/C++ users. As part of the Development Suite it is inexpensive and covers a large number of common errors. On the other hand, it requires a separate build variant and its error detection is sometimes flaky and may even crash the program.
HeapAgent focuses on heap-related problems. This includes a large number of pointer and memory access errors, but by no means all. However, for the errors it detects, runtime performance is excellent, its various browsers are handy for an in-depth view of a program’s heap, and its configurable "agents" are very useful for all sorts of heap-related diagnosis.
Purify 4.0 NT
Purify combines excellent error detection capabilities with an easy to use instrumentation step, requiring no changes to your development process. Moreover, runtime performance with instrumentation is reasonable to good, so there is little reason not to use it. Regrettably, it lacks extensive API validation and is currently only available for Microsoft Visual C++ programs running under Windows NT.
PC-lint is in a different category altogether, so I judge it by different criteria. It can be extremely useful for both C and C++ developers, but effective use requires quite some configuration work. Nevertheless, I would strongly recommended it for use with C and C++ programming, since it will uncover many dubious or outright incorrect situations which aren’t caught by any of the other tools.
In the table below, an empty cell indicates that a feature is not present; otherwise grades are 0 (some), + (good) and ++ (excellent).
|Feature||BoundsChecker 4.0 Professional||CodeGuard 32||HeapAgent 3.0||Purify NT 4.0||PC-lint 7.0|
|Instrumentation technique (*)||RTI||SCI||CTI||LTI, RTI||OCI|
|Read pointer validation||++||0||0||+||0|
|Write pointer validation||0||++||+||0||++||0|
|Other pointer validation||0||+||+||0||++||0|
|C++ checks||++||++||+||+ (static)|
|C runtime library validation||++||++||++||0|
|Windows API validation||++||++||+||0||0|
|Other API validation||+||+|
|Memory leak checks||+||++||+||+||++|
|Handle in use checks||++||++||++||+|
|Other resource leak checks||+||+||+|
|Other features||Event logging||Call profiling||Agents, heap and allocation browsers|
|Integration with IDE||++ (MSVC)||++ (BC++)||+ (MSVC)||0 (MSVC)||0/+|
|Borland C++ support||>= 4.5||>= 4.5 (**)||>= 4.5||Yes|
|Microsoft C++ support||>= 2.1||>= 2.1||>= 2.2||Yes|
|Symantec C++ support||>= 7.0||possibly (***)||Yes|
|Watcom C++ support||>= 10.5||possibly (***)||Yes|
|Windows 95 support||Win95 version||Yes||Yes||Yes|
|Windows NT support||NT version||Yes||Yes||Yes||Yes|
|Other platforms/compilers||Delphi 2.0, Win16 version||Win16||Win16||Yes|
(*) SCI=source code, CTI=compile time, LTI=link time, OCI=object code, RTI=runtime.
(**) CodeGuard 32 only for Borland C++ 5.0 and later.
(***) HeapAgent comes with source code to create link libraries for compilers other than Borland and Microsoft. I have not tested these.
|BoundsChecker:||NuMega Technologies, Inc.|
|CodeGuard:||Borland International, Inc.|
|HeapAgent:||MicroQuill Software Publishing, Inc.|
|Purify/NT:||Pure Atria Software, Inc.|
|Coverage||Ratio of executed versus total program flow. Several types of coverage are in general use. Statement coverage is the simplest, and only checks if each statement is executed. Path coverage is stronger and requires that all possible execution paths are considered (of which there may be infinitely many). Other types include predicate coverage and value coverage, which relate to the possible outcomes of logical tests and the domain of the program’s data, respectively.|
|Debugging||Diagnosing the cause of a failure (i.e., finding the fault), possibly by executing (parts of) the program. Debugging is therefore not the same as testing.|
|Error coverage||I use this phrase to indicate the ratio of errors found to the actual number of errors present in a program.|
|Failure||Deviation from the expected behavior of a program. A failure is caused by a fault, but not every fault causes a failure (much depends on the operating conditions). Moreover, not every failure is detected.|
|Fault||Defect in a program, such as an uninitialized variable, an invalid array index, a comparison that compares the wrong way, etc.|
|Regression tests||Repeats of previous tests, usually after changes to the program, to verify that these tests still fail (and hence that the changes did not break previously good code).|
|Testing||Executing a program with the intent to cause failures. A test succeeds if the program fails; a test fails if the program succeeds.|
Static tests are performed without actually executing the program under test; dynamic tests require execution. Program compilation is a static test; more advanced forms use theorem proving techniques to verify the correctness of a program. Dynamic testing is performed in a crude form by protected mode operating systems such as Unix, OS/2, and Windows NT, which terminate an application if it steps outside its allotted address space or instruction repertoire. More advanced forms use various ways of instrumentation to keep a closer watch on the program’s behavior.
Pros of static checking: Full coverage attainable in theory (but not yet realized in practice); detects both faults and other problems, such as portability, style, etc.; independent of quality of test cases; error reports immediately linked to actual fault.
Cons of static checking: Requires access to source code; in practice, limits to value tracking restrict coverage; detection of dynamic problems (e.g. interactions, synchronization) limited or absent.
Pros of dynamic checking: Detects problems that occur only at runtime (e.g. caused by specific interaction patterns); value tracking and API checking in principle unlimited; access to source code not (always) required.
Cons of dynamic checking: Coverage strongly determined by quality of actual test cases; detected failures may be difficult to relate to actual faults; instrumentation changes program image and may introduce problems of itself; runtime performance may be reduced and thereby cause problems (e.g. in real-time systems).
Regardless of the testing approach, a number of problems will not be caught. Errors of omission are notoriously hard to detect, as are errors of logic (e.g. branching the wrong way), misinterpretation of data values, and erroneous state transitions. Also, verification of a program’s function against the requirements, user interface design, and performance testing are well outside the realm of these tools. Therefore, you’d be well advised not to rely solely on automated testing tools. No matter how useful they are, a clean bill of health given by them should be regarded as a necessary, but by no means sufficient condition to guarantee a correct program.
Instrumentation, in the context of this article, is the process of adding extra code to monitor a program’s behavior. Sometimes object code in the executable image itself is changed; at other times, program flow is diverted by patching entry points to external functions.
Source code instrumentation adds extra instructions at the source code level. NuMega’s BoundsChecker (with technology licensed from ParaSoft) uses this approach and calls it CTI - Compile Time Instrumentation, a slight misnomer in my view.
Compile-time instrumentation modifies the actual translation process and adds extra object code that never had a source code representation. This is what Borland’s C++ compiler does for the benefit of CodeGuard.
Link-time instrumentation uses the properties of the (static) link process to intercept calls to selected library functions and replace them by calls to equivalent, but instrumented versions of them. Both CodeGuard and MicroQuill’s HeapAgent use this technique.
Object code instrumentation takes a ready-to-run executable module and inserts additional code into it, based on an object code-level analysis of the program flow. Pure Software’s Purify is the prime example of this instrumentation technique.
Runtime instrumentation, finally, defers instrumentation to the time when the executable program image is loaded into memory, and only then modifies entry points or uses notifications (including processor exceptions) to get control at critical points. Both BoundsChecker and HeapAgent use this mode of instrumentation.