CWE

Common Weakness Enumeration

A Community-Developed Dictionary of Software Weakness Types

Common Weakness Scoring System
Common Weakness Risk Analysis Framework
Home > CWE List > CWE- Individual Dictionary Definition (2.8)  

Presentation Filter:

CWE-134: Uncontrolled Format String

 
Uncontrolled Format String
Weakness ID: 134 (Weakness Base)Status: Draft
+ Description

Description Summary

The software uses externally-controlled format strings in printf-style functions, which can lead to buffer overflows or data representation problems.
+ Time of Introduction
  • Implementation
+ Applicable Platforms

Languages

C: (Often)

C++: (Often)

Perl: (Rarely)

Languages that support format strings

+ Modes of Introduction

The programmer rarely intends for a format string to be user-controlled at all. This weakness is frequently introduced in code that constructs log messages, where a constant format string is omitted.

In cases such as localization and internationalization, the language-specific message repositories could be an avenue for exploitation, but the format string issue would be resultant, since attacker control of those repositories would also allow modification of message length, format, and content.

+ Common Consequences
ScopeEffect
Confidentiality

Technical Impact: Read memory

Format string problems allow for information disclosure which can severely simplify exploitation of the program.

Integrity
Confidentiality
Availability

Technical Impact: Execute unauthorized code or commands

Format string problems can result in the execution of arbitrary code.

+ Likelihood of Exploit

Very High

+ Detection Methods

Automated Static Analysis

This weakness can often be detected using automated static analysis tools. Many modern tools use data flow analysis or constraint-based techniques to minimize the number of false positives.

Black Box

Since format strings often occur in rarely-occurring erroneous conditions (e.g. for error message logging), they can be difficult to detect using black box methods. It is highly likely that many latent issues exist in executables that do not have associated source code (or equivalent source.

Effectiveness: Limited

Automated Static Analysis - Binary / Bytecode

According to SOAR, the following detection techniques may be useful:

Highly cost effective:

  • Bytecode Weakness Analysis - including disassembler + source code weakness analysis

  • Binary Weakness Analysis - including disassembler + source code weakness analysis

Cost effective for partial coverage:

  • Binary / Bytecode simple extractor – strings, ELF readers, etc.

Effectiveness: SOAR High

Manual Static Analysis - Binary / Bytecode

According to SOAR, the following detection techniques may be useful:

Cost effective for partial coverage:

  • Binary / Bytecode disassembler - then use manual analysis for vulnerabilities & anomalies

Effectiveness: SOAR Partial

Dynamic Analysis with automated results interpretation

According to SOAR, the following detection techniques may be useful:

Cost effective for partial coverage:

  • Web Application Scanner

  • Web Services Scanner

  • Database Scanners

Effectiveness: SOAR Partial

Dynamic Analysis with manual results interpretation

According to SOAR, the following detection techniques may be useful:

Cost effective for partial coverage:

  • Fuzz Tester

  • Framework-based Fuzzer

Effectiveness: SOAR Partial

Manual Static Analysis - Source Code

According to SOAR, the following detection techniques may be useful:

Highly cost effective:

  • Manual Source Code Review (not inspections)

Cost effective for partial coverage:

  • Focused Manual Spotcheck - Focused manual analysis of source

Effectiveness: SOAR High

Automated Static Analysis - Source Code

According to SOAR, the following detection techniques may be useful:

Highly cost effective:

  • Source code Weakness Analyzer

  • Context-configured Source Code Weakness Analyzer

Cost effective for partial coverage:

  • Warning Flags

Effectiveness: SOAR High

Architecture / Design Review

According to SOAR, the following detection techniques may be useful:

Highly cost effective:

  • Formal Methods / Correct-By-Construction

Cost effective for partial coverage:

  • Inspection (IEEE 1028 standard) (can apply to requirements, design, source code, etc.)

Effectiveness: SOAR High

+ Demonstrative Examples

Example 1

The following program prints a string provided as an argument.

(Bad Code)
Example Language:
#include <stdio.h>

void printWrapper(char *string) {

printf(string);
}

int main(int argc, char **argv) {

char buf[5012];
memcpy(buf, argv[1], 5012);
printWrapper(argv[1]);
return (0);
}

The example is exploitable, because of the call to printf() in the printWrapper() function. Note: The stack buffer was added to make exploitation more simple.

Example 2

The following code copies a command line argument into a buffer using snprintf().

(Bad Code)
Example Language:
int main(int argc, char **argv){
char buf[128];
...
snprintf(buf,128,argv[1]);
}

This code allows an attacker to view the contents of the stack and write to the stack using a command line argument containing a sequence of formatting directives. The attacker can read from the stack by providing more formatting directives, such as %x, than the function takes as arguments to be formatted. (In this example, the function takes no arguments to be formatted.) By using the %n formatting directive, the attacker can write to the stack, causing snprintf() to write the number of bytes output thus far to the specified argument (rather than reading a value from the argument, which is the intended behavior). A sophisticated version of this attack will use four staggered writes to completely control the value of a pointer on the stack.

Example 3

Certain implementations make more advanced attacks even easier by providing format directives that control the location in memory to read from or write to. An example of these directives is shown in the following code, written for glibc:

(Bad Code)
Example Language:
printf("%d %d %1$d %1$d\n", 5, 9);

This code produces the following output: 5 9 5 5 It is also possible to use half-writes (%hn) to accurately control arbitrary DWORDS in memory, which greatly reduces the complexity needed to execute an attack that would otherwise require four staggered writes, such as the one mentioned in the first example.

+ Observed Examples
ReferenceDescription
format string in Perl program
format string in bad call to syslog function
format string in bad call to syslog function
format strings in NNTP server responses
Format string vulnerability exploited by triggering errors or warnings, as demonstrated via format string specifiers in a .bmp filename.
Chain: untrusted search path enabling resultant format string by loading malicious internationalization messages
+ Potential Mitigations

Phase: Requirements

Choose a language that is not subject to this flaw.

Phase: Implementation

Ensure that all format string functions are passed a static string which cannot be controlled by the user and that the proper number of arguments are always sent to that function as well. If at all possible, use functions that do not support the %n operator in format strings. [R.134.1] [R.134.2]

Phase: Build and Compilation

Heed the warnings of compilers and linkers, since they may alert you to improper usage.

+ Other Notes

While Format String vulnerabilities typically fall under the Buffer Overflow category, technically they are not overflowed buffers. The Format String vulnerability is fairly new (circa 1999) and stems from the fact that there is no realistic way for a function that takes a variable number of arguments to determine just how many arguments were passed in. The most common functions that take a variable number of arguments, including C-runtime functions, are the printf() family of calls. The Format String problem appears in a number of ways. A *printf() call without a format specifier is dangerous and can be exploited. For example, printf(input); is exploitable, while printf(y, input); is not exploitable in that context. The result of the first call, used incorrectly, allows for an attacker to be able to peek at stack memory since the input string will be used as the format specifier. The attacker can stuff the input string with format specifiers and begin reading stack values, since the remaining parameters will be pulled from the stack. Worst case, this improper use may give away enough control to allow an arbitrary value (or values in the case of an exploit program) to be written into the memory of the running program.

Frequently targeted entities are file names, process names, identifiers.

Format string problems are a classic C/C++ issue that are now rare due to the ease of discovery. One main reason format string vulnerabilities can be exploited is due to the %n operator. The %n operator will write the number of characters, which have been printed by the format string therefore far, to the memory pointed to by its argument. Through skilled creation of a format string, a malicious user may use values on the stack to create a write-what-where condition. Once this is achieved, he can execute arbitrary code. Other operators can be used as well; for example, a %9999s operator could also trigger a buffer overflow, or when used in file-formatting functions like fprintf, it can generate a much larger output than intended.

+ Weakness Ordinalities
OrdinalityDescription
Primary
(where the weakness exists independent of other weaknesses)
+ Relationships
NatureTypeIDNameView(s) this relationship pertains toView(s)
ChildOfWeakness ClassWeakness Class20Improper Input Validation
Seven Pernicious Kingdoms (primary)700
ChildOfWeakness ClassWeakness Class74Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')
Development Concepts (primary)699
Research Concepts (primary)1000
ChildOfCategoryCategory133String Errors
Development Concepts699
ChildOfCategoryCategory633Weaknesses that Affect Memory
Resource-specific Weaknesses (primary)631
ChildOfCategoryCategory726OWASP Top Ten 2004 Category A5 - Buffer Overflows
Weaknesses in OWASP Top Ten (2004) (primary)711
ChildOfCategoryCategory743CERT C Secure Coding Section 09 - Input Output (FIO)
Weaknesses Addressed by the CERT C Secure Coding Standard (primary)734
ChildOfCategoryCategory8082010 Top 25 - Weaknesses On the Cusp
Weaknesses in the 2010 CWE/SANS Top 25 Most Dangerous Programming Errors (primary)800
ChildOfCategoryCategory845CERT Java Secure Coding Section 00 - Input Validation and Data Sanitization (IDS)
Weaknesses Addressed by the CERT Java Secure Coding Standard (primary)844
ChildOfCategoryCategory8652011 Top 25 - Risky Resource Management
Weaknesses in the 2011 CWE/SANS Top 25 Most Dangerous Software Errors (primary)900
ChildOfCategoryCategory877CERT C++ Secure Coding Section 09 - Input Output (FIO)
Weaknesses Addressed by the CERT C++ Secure Coding Standard (primary)868
ChildOfCategoryCategory990SFP Secondary Cluster: Tainted Input to Command
Software Fault Pattern (SFP) Clusters (primary)888
PeerOfWeakness BaseWeakness Base123Write-what-where Condition
Research Concepts1000
MemberOfViewView630Weaknesses Examined by SAMATE
Weaknesses Examined by SAMATE (primary)630
MemberOfViewView635Weaknesses Used by NVD
Weaknesses Used by NVD (primary)635
MemberOfViewView884CWE Cross-section
CWE Cross-section (primary)884
+ Research Gaps

Format string issues are under-studied for languages other than C. Memory or disk consumption, control flow or variable alteration, and data corruption may result from format string exploitation in applications written in other languages such as Perl, PHP, Python, etc.

+ Affected Resources
  • Memory
+ Functional Areas
  • logging
  • errors
  • general output
+ Causal Nature

Implicit

+ Taxonomy Mappings
Mapped Taxonomy NameNode IDFitMapped Node Name
PLOVERFormat string vulnerability
7 Pernicious KingdomsFormat String
CLASPFormat string problem
CERT C Secure CodingFIO30-CExactExclude user input from format strings
OWASP Top Ten 2004A1CWE More SpecificUnvalidated Input
CERT C Secure CodingFIO30-CExclude user input from format strings
WASC6Format String
CERT Java Secure CodingIDS06-JExclude user input from format strings
CERT C++ Secure CodingFIO30-CPPExclude user input from format strings
Software Fault PatternsSFP24Tainted input to command
+ White Box Definitions

A weakness where the code path has:

1. start statement that accepts input

2. end statement that passes a format string to format string function where

a. the input data is part of the format string and

b. the format string is undesirable

Where "undesirable" is defined through the following scenarios:

1. not validated

2. incorrectly validated

+ References
[R.134.1] Steve Christey. "Format String Vulnerabilities in Perl Programs". <http://www.securityfocus.com/archive/1/418460/30/0/threaded>.
[R.134.2] Hal Burch and Robert C. Seacord. "Programming Language Format String Vulnerabilities". <http://www.ddj.com/dept/security/197002914>.
[R.134.3] Tim Newsham. "Format String Attacks". Guardent. September 2000. <http://www.thenewsh.com/~newsham/format-string-attacks.pdf>.
[R.134.4] [REF-11] M. Howard and D. LeBlanc. "Writing Secure Code". Chapter 5, "Format String Bugs" Page 147. 2nd Edition. Microsoft. 2002.
[R.134.5] [REF-17] Michael Howard, David LeBlanc and John Viega. "24 Deadly Sins of Software Security". "Sin 6: Format String Problems." Page 109. McGraw-Hill. 2010.
[R.134.5] [REF-7] Mark Dowd, John McDonald and Justin Schuh. "The Art of Software Security Assessment". Chapter 8, "C Format Strings", Page 422.. 1st Edition. Addison Wesley. 2006.
+ Content History
Submissions
Submission DateSubmitterOrganizationSource
PLOVERExternally Mined
Modifications
Modification DateModifierOrganizationSource
2008-08-01KDM AnalyticsExternal
added/updated white box definitions
2008-09-08CWE Content TeamMITREInternal
updated Applicable_Platforms, Common_Consequences, Detection_Factors, Modes_of_Introduction, Relationships, Other_Notes, Research_Gaps, Taxonomy_Mappings, Weakness_Ordinalities
2008-11-24CWE Content TeamMITREInternal
updated Relationships, Taxonomy_Mappings
2009-03-10CWE Content TeamMITREInternal
updated Relationships
2009-05-27CWE Content TeamMITREInternal
updated Demonstrative_Examples
2009-07-17KDM AnalyticsExternal
Improved the White_Box_Definition
2009-07-27CWE Content TeamMITREInternal
updated White_Box_Definitions
2010-02-16CWE Content TeamMITREInternal
updated Detection_Factors, References, Relationships, Taxonomy_Mappings
2011-06-01CWE Content TeamMITREInternal
updated Common_Consequences, Relationships, Taxonomy_Mappings
2011-06-27CWE Content TeamMITREInternal
updated Modes_of_Introduction, Relationships
2011-09-13CWE Content TeamMITREInternal
updated Potential_Mitigations, References, Relationships, Taxonomy_Mappings
2012-05-11CWE Content TeamMITREInternal
updated Observed_Examples, References, Related_Attack_Patterns, Relationships, Taxonomy_Mappings
2014-07-30CWE Content TeamMITREInternal
updated Demonstrative_Examples, Detection_Factors, Relationships, Taxonomy_Mappings
Page Last Updated: July 30, 2014