CWE

Common Weakness Enumeration

A community-developed list of SW & HW weaknesses that can become vulnerabilities

New to CWE? click here!
CWE Most Important Hardware Weaknesses
CWE Top 25 Most Dangerous Weaknesses
Home > CWE List > CWE- Individual Dictionary Definition (4.14)  
ID

CWE-185: Incorrect Regular Expression

Weakness ID: 185
Vulnerability Mapping: ALLOWEDThis CWE ID could be used to map to real-world vulnerabilities in limited situations requiring careful review (with careful review of mapping notes)
Abstraction: ClassClass - a weakness that is described in a very abstract fashion, typically independent of any specific language or technology. More specific than a Pillar Weakness, but more general than a Base Weakness. Class level weaknesses typically describe issues in terms of 1 or 2 of the following dimensions: behavior, property, and resource.
View customized information:
For users who are interested in more notional aspects of a weakness. Example: educators, technical writers, and project/program managers. For users who are concerned with the practical application and details about the nature of a weakness and how to prevent it from happening. Example: tool developers, security researchers, pen-testers, incident response analysts. For users who are mapping an issue to CWE/CAPEC IDs, i.e., finding the most appropriate CWE for a specific issue (e.g., a CVE record). Example: tool developers, security researchers. For users who wish to see all available information for the CWE/CAPEC entry. For users who want to customize what details are displayed.
×

Edit Custom Filter


+ Description
The product specifies a regular expression in a way that causes data to be improperly matched or compared.
+ Extended Description
When the regular expression is used in protection mechanisms such as filtering or validation, this may allow an attacker to bypass the intended restrictions on the incoming data.
+ Relationships
Section HelpThis table shows the weaknesses and high level categories that are related to this weakness. These relationships are defined as ChildOf, ParentOf, MemberOf and give insight to similar items that may exist at higher and lower levels of abstraction. In addition, relationships such as PeerOf and CanAlsoBe are defined to show similar weaknesses that the user may want to explore.
+ Relevant to the view "Research Concepts" (CWE-1000)
NatureTypeIDName
ChildOfPillarPillar - a weakness that is the most abstract type of weakness and represents a theme for all class/base/variant weaknesses related to it. A Pillar is different from a Category as a Pillar is still technically a type of weakness that describes a mistake, while a Category represents a common characteristic used to group related things.697Incorrect Comparison
ParentOfBaseBase - a weakness that is still mostly independent of a resource or technology, but with sufficient details to provide specific methods for detection and prevention. Base level weaknesses typically describe issues in terms of 2 or 3 of the following dimensions: behavior, property, technology, language, and resource.186Overly Restrictive Regular Expression
ParentOfBaseBase - a weakness that is still mostly independent of a resource or technology, but with sufficient details to provide specific methods for detection and prevention. Base level weaknesses typically describe issues in terms of 2 or 3 of the following dimensions: behavior, property, technology, language, and resource.625Permissive Regular Expression
CanPrecedeBaseBase - a weakness that is still mostly independent of a resource or technology, but with sufficient details to provide specific methods for detection and prevention. Base level weaknesses typically describe issues in terms of 2 or 3 of the following dimensions: behavior, property, technology, language, and resource.182Collapse of Data into Unsafe Value
CanPrecedeVariantVariant - a weakness that is linked to a certain type of product, typically involving a specific language or technology. More specific than a Base weakness. Variant level weaknesses typically describe issues in terms of 3 to 5 of the following dimensions: behavior, property, technology, language, and resource.187Partial String Comparison
+ Modes Of Introduction
Section HelpThe different Modes of Introduction provide information about how and when this weakness may be introduced. The Phase identifies a point in the life cycle at which introduction may occur, while the Note provides a typical scenario related to introduction during the given phase.
PhaseNote
Implementation
+ Applicable Platforms
Section HelpThis listing shows possible areas for which the given weakness could appear. These may be for specific named Languages, Operating Systems, Architectures, Paradigms, Technologies, or a class of such platforms. The platform is listed along with how frequently the given weakness appears for that instance.

Languages

Class: Not Language-Specific (Undetermined Prevalence)

+ Common Consequences
Section HelpThis table specifies different individual consequences associated with the weakness. The Scope identifies the application security area that is violated, while the Impact describes the negative technical impact that arises if an adversary succeeds in exploiting this weakness. The Likelihood provides information about how likely the specific consequence is expected to be seen relative to the other consequences in the list. For example, there may be high likelihood that a weakness will be exploited to achieve a certain impact, but a low likelihood that it will be exploited to achieve a different impact.
ScopeImpactLikelihood
Other

Technical Impact: Unexpected State; Varies by Context

When the regular expression is not correctly specified, data might have a different format or type than the rest of the program expects, producing resultant weaknesses or errors.
Access Control

Technical Impact: Bypass Protection Mechanism

In PHP, regular expression checks can sometimes be bypassed with a null byte, leading to any number of weaknesses.
+ Demonstrative Examples

Example 1

The following code takes phone numbers as input, and uses a regular expression to reject invalid phone numbers.

(bad code)
Example Language: Perl 
$phone = GetPhoneNumber();
if ($phone =~ /\d+-\d+/) {
# looks like it only has hyphens and digits
system("lookup-phone $phone");
}
else {
error("malformed number!");
}

An attacker could provide an argument such as: "; ls -l ; echo 123-456" This would pass the check, since "123-456" is sufficient to match the "\d+-\d+" portion of the regular expression.

Example 2

This code uses a regular expression to validate an IP string prior to using it in a call to the "ping" command.

(bad code)
Example Language: Python 
import subprocess
import re

def validate_ip_regex(ip: str):
ip_validator = re.compile(r"((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}")
if ip_validator.match(ip):
return ip
else:
raise ValueError("IP address does not match valid pattern.")

def run_ping_regex(ip: str):
validated = validate_ip_regex(ip)
# The ping command treats zero-prepended IP addresses as octal
result = subprocess.call(["ping", validated])
print(result)

Since the regular expression does not have anchors (CWE-777), i.e. is unbounded without ^ or $ characters, then prepending a 0 or 0x to the beginning of the IP address will still result in a matched regex pattern. Since the ping command supports octal and hex prepended IP addresses, it will use the unexpectedly valid IP address (CWE-1389). For example, "0x63.63.63.63" would be considered equivalent to "99.63.63.63". As a result, the attacker could potentially ping systems that the attacker cannot reach directly.

+ Observed Examples
ReferenceDescription
Regexp isn't "anchored" to the beginning or end, which allows spoofed values that have trusted values as substrings.
Regexp for IP address isn't anchored at the end, allowing appending of shell metacharacters.
Bypass access restrictions via multiple leading slash, which causes a regular expression to fail.
Local user DoS via invalid regular expressions.
chain: Malformed input generates a regular expression error that leads to information exposure.
Certain strings are later used in a regexp, leading to a resultant crash.
MFV. Regular expression intended to protect against directory traversal reduces ".../...//" to "../".
Malformed regexp syntax leads to information exposure in error message.
Code injection due to improper quoting of regular expression.
Null byte bypasses PHP regexp check.
Null byte bypasses PHP regexp check.
+ Potential Mitigations

Phase: Architecture and Design

Strategy: Refactoring

Regular expressions can become error prone when defining a complex language even for those experienced in writing grammars. Determine if several smaller regular expressions simplify one large regular expression. Also, subject the regular expression to thorough testing techniques such as equivalence partitioning, boundary value analysis, and robustness. After testing and a reasonable confidence level is achieved, a regular expression may not be foolproof. If an exploit is allowed to slip through, then record the exploit and refactor the regular expression.
+ Detection Methods

Automated Static Analysis

Automated static analysis, commonly referred to as Static Application Security Testing (SAST), can find some instances of this weakness by analyzing source code (or binary/compiled code) without having to execute it. Typically, this is done by building a model of data flow and control flow, then searching for potentially-vulnerable patterns that connect "sources" (origins of input) with "sinks" (destinations where the data interacts with external components, a lower layer such as the OS, etc.)

Effectiveness: High

+ Memberships
Section HelpThis MemberOf Relationships table shows additional CWE Categories and Views that reference this weakness as a member. This information is often useful in understanding where a weakness fits within the context of external information sources.
NatureTypeIDName
MemberOfViewView - a subset of CWE entries that provides a way of examining CWE content. The two main view structures are Slices (flat lists) and Graphs (containing relationships between entries).884CWE Cross-section
MemberOfCategoryCategory - a CWE entry that contains a set of other entries that share a common characteristic.990SFP Secondary Cluster: Tainted Input to Command
MemberOfCategoryCategory - a CWE entry that contains a set of other entries that share a common characteristic.1397Comprehensive Categorization: Comparison
+ Vulnerability Mapping Notes

Usage: ALLOWED-WITH-REVIEW

(this CWE ID could be used to map to real-world vulnerabilities in limited situations requiring careful review)

Reason: Abstraction

Rationale:

This CWE entry is a Class and might have Base-level children that would be more appropriate

Comments:

Examine children of this entry to see if there is a better fit
+ Notes

Relationship

While there is some overlap with allowlist/denylist problems, this entry is intended to deal with incorrectly written regular expressions, regardless of their intended use. Not every regular expression is intended for use as an allowlist or denylist. In addition, allowlists and denylists can be implemented using other mechanisms besides regular expressions.

Research Gap

Regexp errors are likely a primary factor in many MFVs, especially those that require multiple manipulations to exploit. However, they are rarely diagnosed at this level of detail.
+ Taxonomy Mappings
Mapped Taxonomy NameNode IDFitMapped Node Name
PLOVERRegular Expression Error
+ References
[REF-7] Michael Howard and David LeBlanc. "Writing Secure Code". Chapter 10, "Using Regular Expressions for Checking Input" Page 350. 2nd Edition. Microsoft Press. 2002-12-04. <https://www.microsoftpressstore.com/store/writing-secure-code-9780735617223>.
+ Content History
+ Submissions
Submission DateSubmitterOrganization
2006-07-19
(CWE Draft 3, 2006-07-19)
PLOVER
+ Modifications
Modification DateModifierOrganization
2008-07-01Eric DalciCigital
updated Time_of_Introduction
2008-09-08CWE Content TeamMITRE
updated Description, Name, Relationships, Observed_Example, Other_Notes, Taxonomy_Mappings
2009-12-28CWE Content TeamMITRE
updated Common_Consequences, Other_Notes
2010-02-16CWE Content TeamMITRE
updated References
2010-04-05CWE Content TeamMITRE
updated Description
2011-03-29CWE Content TeamMITRE
updated Observed_Examples
2011-06-01CWE Content TeamMITRE
updated Common_Consequences
2012-05-11CWE Content TeamMITRE
updated Demonstrative_Examples, Related_Attack_Patterns, Relationships
2012-10-30CWE Content TeamMITRE
updated Potential_Mitigations
2014-06-23CWE Content TeamMITRE
updated Applicable_Platforms, Common_Consequences, Other_Notes, Relationship_Notes
2014-07-30CWE Content TeamMITRE
updated Demonstrative_Examples, Relationships
2015-12-07CWE Content TeamMITRE
updated Relationships
2017-11-08CWE Content TeamMITRE
updated References
2018-03-27CWE Content TeamMITRE
updated References
2019-06-20CWE Content TeamMITRE
updated Related_Attack_Patterns, Relationships, Type
2020-02-24CWE Content TeamMITRE
updated Relationships, Type
2020-06-25CWE Content TeamMITRE
updated Relationship_Notes
2021-03-15CWE Content TeamMITRE
updated Relationships
2022-10-13CWE Content TeamMITRE
updated Demonstrative_Examples, Relationships
2023-01-31CWE Content TeamMITRE
updated Description
2023-04-27CWE Content TeamMITRE
updated Detection_Factors, Relationships
2023-06-29CWE Content TeamMITRE
updated Mapping_Notes
+ Previous Entry Names
Change DatePrevious Entry Name
2008-09-09Regular Expression Error
Page Last Updated: February 29, 2024