The software uses externally-controlled format strings in printf-style functions, which can lead to buffer overflows or data representation problems.
Time of Introduction
Implementation
Applicable Platforms
Languages
C: (Often)
C++: (Often)
Perl: (Rarely)
Languages that support format strings
Modes of Introduction
The programmer rarely intends for a format string to be user-controlled at
all. This weakness is frequently introduced in code that constructs log
messages, where a constant format string is omitted.
In cases such as localization and internationalization, the
language-specific message repositories could be an avenue for exploitation,
but the format string issue would be resultant, since attacker control of
those repositories would also allow modification of message length, format,
and content.
Common Consequences
Scope
Effect
Confidentiality
Technical Impact: Read memory
Format string problems allow for information disclosure which can
severely simplify exploitation of the program.
Integrity
Confidentiality
Availability
Technical Impact: Execute unauthorized code or
commands
Format string problems can result in the execution of arbitrary
code.
Likelihood of Exploit
Very High
Detection Methods
Automated Static Analysis
This weakness can often be detected using automated static analysis
tools. Many modern tools use data flow analysis or constraint-based
techniques to minimize the number of false positives.
Black Box
Since format strings often occur in rarely-occurring erroneous
conditions (e.g. for error message logging), they can be difficult to
detect using black box methods. It is highly likely that many latent
issues exist in executables that do not have associated source code (or
equivalent source.
Effectiveness: Limited
Demonstrative Examples
Example 1
The following example is exploitable, due to the printf() call in
the printWrapper() function. Note: The stack buffer was added to make
exploitation more simple.
(Bad Code)
Example
Language: C
#include <stdio.h>
void printWrapper(char *string) {
printf(string);
}
int main(int argc, char **argv) {
char buf[5012];
memcpy(buf, argv[1], 5012);
printWrapper(argv[1]);
return (0);
}
Example 2
The following code copies a command line argument into a buffer
using snprintf().
(Bad Code)
Example
Language: C
int main(int argc, char **argv){
char buf[128];
...
snprintf(buf,128,argv[1]);
}
This code allows an attacker to view the contents of the stack and
write to the stack using a command line argument containing a sequence
of formatting directives. The attacker can read from the stack by
providing more formatting directives, such as %x, than the function
takes as arguments to be formatted. (In this example, the function takes
no arguments to be formatted.) By using the %n formatting directive, the
attacker can write to the stack, causing snprintf() to write the number
of bytes output thus far to the specified argument (rather than reading
a value from the argument, which is the intended behavior). A
sophisticated version of this attack will use four staggered writes to
completely control the value of a pointer on the stack.
Example 3
Certain implementations make more advanced attacks even easier by
providing format directives that control the location in memory to read from
or write to. An example of these directives is shown in the following code,
written for glibc:
(Bad Code)
Example
Language: C
printf("%d %d %1$d %1$d\n", 5, 9);
This code produces the following output: 5 9 5 5 It is also possible
to use half-writes (%hn) to accurately control arbitrary DWORDS in
memory, which greatly reduces the complexity needed to execute an attack
that would otherwise require four staggered writes, such as the one
mentioned in the first example.
Chain: untrusted search path enabling resultant
format string by loading malicious internationalization
messages
Potential Mitigations
Phase: Requirements
Choose a language that is not subject to this flaw.
Phase: Implementation
Ensure that all format string functions are passed a static string which cannot be controlled by the user and that the proper number of arguments are always sent to that function as well. If at all possible, use functions that do not support the %n operator in format strings. [R.134.1] [R.134.2]
Phase: Build and Compilation
Heed the warnings of compilers and linkers, since they may alert you
to improper usage.
Other Notes
While Format String vulnerabilities typically fall under the Buffer
Overflow category, technically they are not overflowed buffers. The Format
String vulnerability is fairly new (circa 1999) and stems from the fact that
there is no realistic way for a function that takes a variable number of
arguments to determine just how many arguments were passed in. The most
common functions that take a variable number of arguments, including
C-runtime functions, are the printf() family of calls. The Format String
problem appears in a number of ways. A *printf() call without a format
specifier is dangerous and can be exploited. For example, printf(input); is
exploitable, while printf(y, input); is not exploitable in that context. The
result of the first call, used incorrectly, allows for an attacker to be
able to peek at stack memory since the input string will be used as the
format specifier. The attacker can stuff the input string with format
specifiers and begin reading stack values, since the remaining parameters
will be pulled from the stack. Worst case, this improper use may give away
enough control to allow an arbitrary value (or values in the case of an
exploit program) to be written into the memory of the running
program.
Frequently targeted entities are file names, process names,
identifiers.
Format string problems are a classic C/C++ issue that are now rare due to
the ease of discovery. One main reason format string vulnerabilities can be
exploited is due to the %n operator. The %n operator will write the number
of characters, which have been printed by the format string therefore far,
to the memory pointed to by its argument. Through skilled creation of a
format string, a malicious user may use values on the stack to create a
write-what-where condition. Once this is achieved, he can execute arbitrary
code. Other operators can be used as well; for example, a %9999s operator
could also trigger a buffer overflow, or when used in file-formatting
functions like fprintf, it can generate a much larger output than
intended.
Weakness Ordinalities
Ordinality
Description
Primary
(where
the weakness exists independent of other weaknesses)
Format string issues are under-studied for languages other than C. Memory
or disk consumption, control flow or variable alteration, and data
corruption may result from format string exploitation in applications
written in other languages such as Perl, PHP, Python, etc.
[R.134.4] [REF-11] M. Howard and
D. LeBlanc. "Writing Secure Code". Chapter 5, "Format String Bugs" Page 147. 2nd Edition. Microsoft. 2002.
[R.134.5] [REF-17] Michael Howard, David LeBlanc
and John Viega. "24 Deadly Sins of Software Security". "Sin 6: Format String Problems." Page 109. McGraw-Hill. 2010.
[R.134.5] [REF-7] Mark Dowd, John McDonald
and Justin Schuh. "The Art of Software Security Assessment". Chapter 8, "C Format Strings", Page 422.. 1st Edition. Addison Wesley. 2006.