CWE-138: Improper Sanitization of Special Elements
Improper Sanitization of Special Elements
Weakness ID: 138 (Weakness Class)
Status: Draft
Description
Description Summary
The software receives input from an upstream component, but it
does not sanitize or incorrectly sanitizes special elements that could be
interpreted as control elements or syntactic markers when they are sent to a
downstream component.
Extended Description
Most languages and protocols have their own special elements such as
characters and reserved words. These special elements can carry control
implications. If software fails to prevent external control or influence
over the inclusion of such special elements, the control flow of the program
may be altered from what was intended. For example, both Unix and Windows
interpret the symbol < ("less than") as meaning "read input from a
file".
Multi-channel issue. Terminal escape sequences not
filtered from log files.
Potential Mitigations
Phase
Description
Implementation
Developers should anticipate that special elements (e.g. delimiters,
symbols) will be injected into input vectors of their software system.
One defense is to create a white list (e.g. a regular expression) that
defines valid input according to the requirements specifications.
Strictly filter any input that does not match against the white list.
Properly encode your output, and quote any elements that have special
meaning to the component with which you are communicating.
Architecture and Design
Implementation
Assume all input is malicious. Use a standard input validation
mechanism to validate all input for length, type, syntax, and business
rules before accepting the data to be displayed or stored. Use an
"accept known good" validation strategy.
Implementation
Use and specify an appropriate output encoding to ensure that the
special elements are well-defined. A normal byte sequence in one
encoding could be a special element in another.
Implementation
Do not rely exclusively on blacklist validation to detect malicious
input or to encode output. There are too many variants to encode a
character; you're likely to miss some variants.
Implementation
Inputs should be decoded and canonicalized to the application's
current internal representation before being validated. Make sure that
your application does not decode the same input twice. Such errors could
be used to bypass whitelist schemes by introducing dangerous inputs
after they have been checked.
Weakness Ordinalities
Ordinality
Description
Primary
(where the
weakness exists independent of other weaknesses)
This weakness can be related to interpretation conflicts or interaction
errors in intermediaries (such as proxies or application firewalls) when the
intermediary's model of an endpoint does not account for protocol-specific
special elements.
See this entry's children for different types of special elements that
have been observed at one point or another. However, it can be difficult to
find suitable CVE examples. In an attempt to be complete, CWE includes some
types that do not have any associated observed example.
Research Gaps
This weakness is probably under-studied for proprietary or custom formats.
It is likely that these issues are fairly common in applications that use
their own custom format for configuration files, logs, meta-data, messaging,
etc. They would only be found by accident or with a focused effort based on
an understanding of the format.