The product does not validate or incorrectly validates input
that can affect the control flow or data flow of a
program.
Extended Description
When software fails to validate input properly, an attacker is able to
craft the input in a form that is not expected by the rest of the
application. This will lead to parts of the system receiving unintended
input, which may result in altered control flow, arbitrary control of a
resource, or arbitrary code execution.
Terminology Notes
The "input validation" term is extremely common, but it is used in many
different ways. In some cases its usage can obscure the real underlying
weakness or otherwise hide chaining and composite relationships.
Some people use "input validation" as a general term that covers many
different techniques for ensuring that input is appropriate, such as
cleansing/filtering, canonicalization, and escaping. Others use the term in
a more narrow context to simply mean "checking if an input conforms to
expectations without changing it."
Time of Introduction
Architecture and Design
Implementation
Applicable Platforms
Languages
All
Platform Notes
Input validation can be a problem in any system that handles data from an
external source.
Modes of Introduction
If a programmer believes that an attacker cannot modify certain inputs,
then the programmer might not perform any input validation at all. For
example, in web applications, many programmers believe that cookies and
hidden form fields can not be modified from a web browser (CWE-472),
although they can be altered using a proxy or a custom program. In a
client-server architecture, the programmer might assume that client-side
security checks cannot be bypassed, even when a custom client could be
written that skips those checks (CWE-602).
Common Consequences
Scope
Effect
Availability
An attacker could provide unexpected values and cause a program crash
or excessive consumption of resources, such as memory and CPU.
Confidentiality
An attacker could read confidential data if they are able to control
resource references.
Integrity
An attacker could use malicious input to modify data or possibly alter
control flow in unexpected ways, including arbitrary command
execution.
Likelihood of Exploit
High
Demonstrative Examples
Example 1
This example demonstrates a shopping interaction in which the user
is free to specify the quantity of items to be purchased and a total is
calculated.
(Bad Code)
Java
...
public static final double price = 20.00;
int quantity = currentUser.getAttribute("quantity");
double total = price * quantity;
chargeUser(total);
...
The user has no control over the price variable, however the code does
not prevent a negative value from being specified for quantity. If an
attacker were to provide a negative value, then the user would have
their account credited instead of debited.
Example 2
This example asks the user for a height and width of an m X n game
board with a maximum dimension of 100 squares.
(Bad Code)
C
...
#define MAX_DIM 100
...
int m,n, error; /* board dimensions */
board_square_t *board;
printf("Please specify the board height: \n");
error = scanf("%d", &m);
if ( EOF == error ){
die("No integer passed: Die evil hacker!\n");
}
printf("Please specify the board width: \n");
error = scanf("%d", &n);
if ( EOF == error ){
die("No integer passed: Die evil hacker!\n");
}
if ( m > MAX_DIM || n > MAX_DIM ) {
die("Value too large: Die evil hacker!\n");
}
board = (board_square_t*) malloc( m * n *
sizeof(board_square_t));
...
While this code checks to make sure the user cannot specify large,
positive integers and consume too much memory, it fails to check for
negative values supplied by the user. As a result, an attacker can
perform a resource consumption (CWE-400) attack against this program by
specifying two, large negative values that will not overflow, resulting
in a very large memory allocation (CWE-789) and possibly a system crash.
Alternatively, an attacker can provide very large negative values which
will cause an integer overflow (CWE-190) and unexpected behavior will
follow depending on how the values are treated in the remainder of the
program.
Example 3
The following example shows a PHP application in which the
programmer attempts to display a user's birthday and homepage.
The programmer intended for $birthday to be in a date format and
$homepage to be a valid URL. However, since the values are derived from
an HTTP request, if an attacker can trick a victim into clicking a
crafted URL with <script> tags providing the values for
birthday and / or homepage, then the script will run on the client's
browser when the webserver echoes the content. Notice that even if the
programmer were to defend the $birthday variable by restricting input to
integers and dashes, it would still be possible for an attacker to
provide a string of the form:
(Attack)
2009-01-09--
If this data were used in a SQL statement, it would treat the
remainder of the statement as a comment. The comment could disable other
security-related logic in the statement. In this case, encoding combined
with input validation would be a more useful protection
mechanism.
Furthermore, an XSS (CWE-79) attack or SQL injection (CWE-89) are just
a few of the potential consequences in a failed protection mechanism of
this nature. Depending on the context of the code, CRLF Injection
(CWE-93), Argument Injection (CWE-88), or Command Injection (CWE-77) may
also be possible.
Example 4
This function attempts to extract a pair of numbers from a
user-supplied string.
(Bad Code)
C
void parse_data(char *untrusted_input){
int m, n, error;
error = sscanf(untrusted_input, "%d:%d", &m,
&n);
if ( EOF == error ){
die("Did not specify integer value. Die evil
hacker!\n");
}
/* proceed assuming n and m are initialized correctly
*/
}
This code attempts to extract two integer values out of a formatted,
user-supplied input. However, if an attacker were to provide an input of
the form:
(Attack)
123:
then only the m variable will be initialized. Subsequent use of n may
result in the use of an uninitialized variable (CWE-457).
Example 5
The following example takes a user-supplied value to allocate an
array of objects and then operates on the array.
(Bad Code)
Java
private void buildList ( int untrustedListSize ){
if ( 0 > untrustedListSize ){
die("Negative value supplied for list size, die evil
hacker!");
}
Widget[] list = new Widget [ untrustedListSize ];
list[0] = new Widget();
}
This example attempts to build a list from a user-specified value, and
even checks to ensure a non-negative value is supplied. If, however, a 0
value is provided, the code will build an array of size 0 and then try
to store a new Widget in the first location, causing an exception to be
thrown.
crash via multiple "." characters in file
extension
Potential Mitigations
Phase
Description
Architecture and Design
Use an input validation framework such as Struts or the OWASP ESAPI
Validation API. If you use Struts, be mindful of weaknesses covered by
the CWE-101 category.
Architecture and Design
Understand all the potential areas where untrusted inputs can enter
your software: parameters or arguments, cookies, anything read from the
network, environment variables, request headers as well as content, URL
components, e-mail, files, databases, and any external systems that
provide data to the application. Perform input validation at
well-defined interfaces.
Architecture and Design
Assume all input is malicious. Use an "accept known good" input
validation strategy (i.e., use a whitelist). Reject any input that does
not strictly conform to specifications, or transform it into something
that does. Use a blacklist to reject any unexpected inputs and detect
potential attacks.
Use a standard input validation mechanism to validate all input for
length, type, syntax, and business rules before accepting the input for
further processing. As an example of business rule logic, "boat" may be
syntactically valid because it only contains alphanumeric characters,
but it is not valid if you are expecting colors such as "red" or "blue."
Architecture and Design
For any security checks that are performed on the client side, ensure
that these checks are duplicated on the server side, in order to avoid
CWE-602. Attackers can bypass the client-side checks by modifying values
after the checks have been performed, or by changing the client to
remove the client-side checks entirely. Then, these modified values
would be submitted to the server.
Even though client-side checks provide minimal benefits with respect
to server-side security, they are still useful. First, they can support
intrusion detection. If the server receives input that should have been
rejected by the client, then it may be an indication of an attack.
Second, client-side error-checking can provide helpful feedback to the
user about the expectations for valid input. Third, there may be a
reduction in server-side processing time for accidental input errors,
although this is typically a small savings.
Architecture and Design
Do not rely exclusively on blacklist validation to detect malicious
input or to encode output (CWE-184). There are too many ways to encode
the same character, so you're likely to miss some variants.
Implementation
When your application combines data from multiple sources, perform the
validation after the sources have been combined. The individual data
elements may pass the validation step but violate the intended
restrictions after they have been combined.
Implementation
Be especially careful to validate your input when you invoke code that
crosses language boundaries, such as from an interpreted language to
native code. This could create an unexpected interaction between the
language boundaries. Ensure that you are not violating any of the
expectations of the language with which you are interfacing. For
example, even though Java may not be susceptible to buffer overflows,
providing a large argument in a call to native code might trigger an
overflow.
Implementation
Directly convert your input type into the expected data type, such as
using a conversion function that translates a string into a number.
After converting to the expected data type, ensure that the input's
values fall within the expected range of allowable values and that
multi-field consistencies are maintained.
Implementation
Inputs should be decoded and canonicalized to the application's
current internal representation before being validated (CWE-180,
CWE-181). Make sure that your application does not inadvertently decode
the same input twice (CWE-174). Such errors could be used to bypass
whitelist schemes by introducing dangerous inputs after they have been
checked. Use libraries such as the OWASP ESAPI Canonicalization
control.
Consider performing repeated canonicalization until your input does
not change any more. This will avoid double-decoding and similar
scenarios, but it might inadvertently modify inputs that are allowed to
contain properly-encoded dangerous content.
Implementation
When exchanging data between components, ensure that both components
are using the same character encoding. Ensure that the proper encoding
is applied at each interface. Explicitly set the encoding you are using
whenever the protocol allows you to do so.
Testing
Use automated static analysis tools that target this type of weakness.
Many modern techniques use data flow analysis to minimize the number of
false positives. This is not a perfect solution, since 100% accuracy and
coverage are not feasible.
Testing
Use dynamic tools and techniques that interact with the software using
large test suites with many diverse inputs, such as fuzz testing
(fuzzing), robustness testing, and fault injection. The software's
operation may slow down, but it should not become unstable, crash, or
generate incorrect results.
CWE-116 and CWE-20 have a close association because, depending on the
nature of the structured message, proper input validation can indirectly
prevent special characters from changing the meaning of a structured
message. For example, by validating that a numeric ID field should only
contain the 0-9 characters, the programmer effectively prevents injection
attacks.
However, input validation is not always sufficient, especially when less
stringent data types must be supported, such as free-form text. Consider a
SQL injection scenario in which a last name is inserted into a query. The
name "O'Reilly" would likely pass the validation step since it is a common
last name in the English language. However, it cannot be directly inserted
into the database because it contains the "'" apostrophe character, which
would need to be escaped or otherwise handled. In this case, stripping the
apostrophe might reduce the risk of SQL injection, but it would produce
incorrect behavior because the wrong name would be recorded.
Research Gaps
There is not much research into the classification of input validation
techniques and their application. Many publicly-disclosed vulnerabilities
simply characterize a problem as "input validation" without providing more
specific details that might contribute to a deeper understanding of
validation techniques and the weaknesses they can prevent or reduce.
Validation is over-emphasized in contrast to other sanitization techniques
such as cleansing and enforcement by conversion. See the vulnerability
theory paper.
Taxonomy Mappings
Mapped Taxonomy Name
Node ID
Fit
Mapped Node Name
7 Pernicious Kingdoms
Input validation and representation
OWASP Top Ten 2004
A1
CWE More Specific
Unvalidated Input
CERT C Secure Coding
ERR07-C
Prefer functions that support error checking over equivalent
functions that don't
CERT C Secure Coding
INT06-C
Use strtol() or a related function to convert a string token
to an integer
CERT C Secure Coding
MEM10-C
Define and use a pointer validation
function
CERT C Secure Coding
MSC08-C
Library functions should validate their
parameters
Input validation - whether missing or incorrect - is such an essential and
widespread part of secure development that it is implicit in many different
weaknesses. Traditionally, problems such as buffer overflows and XSS have
been classified as input validation problems by many security professionals.
However, input validation is not necessarily the only protection mechanism
available for avoiding such problems, and in some cases it is not even
sufficient. The CWE team has begun capturing these subtleties in chains
within the Research Concepts view (CWE-1000), but more work is
needed.
Content History
Submissions
Submission Date
Submitter
Organization
Source
7 Pernicious Kingdoms
Externally Mined
Modifications
Modification Date
Modifier
Organization
Source
2008-07-01
Eric Dalci
Cigital
External
updated Potential Mitigations,
Time of Introduction
2008-08-15
Veracode
External
Suggested OWASP Top Ten 2004
mapping
2008-09-08
CWE Content Team
MITRE
Internal
updated Relationships, Other Notes,
Taxonomy Mappings
2008-11-24
CWE Content Team
MITRE
Internal
updated Relationships,
Taxonomy Mappings
2009-01-12
CWE Content Team
MITRE
Internal
updated Applicable Platforms, Common Consequences,
Demonstrative Examples, Description, Likelihood of Exploit, Name,
Observed Examples, Other Notes, Potential Mitigations, References,
Relationship Notes, Relationships
2009-03-10
CWE Content Team
MITRE
Internal
updated Description,
Potential Mitigations
2009-05-27
CWE Content Team
MITRE
Internal
updated Related Attack Patterns
2009-07-27
CWE Content Team
MITRE
Internal
updated Relationships
2009-10-29
CWE Content Team
MITRE
Internal
updated Common Consequences, Demonstrative Examples,
Maintenance Notes, Modes of Introduction, Observed Examples, Relationships,
Research Gaps, Terminology Notes