CWE
Home > Documents > Introduction to Vulnerability Theory   View the CWE List

Introduction to Vulnerability Theory
Introduction to Vulnerability Theory

Document version: 0.5    Date: July 9, 2007

This is draft document. It is intended to support maintenance of CWE, and to educate and solicit feedback from a specific technical audience. This document does not reflect any official position of the MITRE Corporation or its sponsors. Copyright © 2007, The MITRE Corporation. All rights reserved. Permission is granted to redistribute this document if this paragraph is not removed. This document is subject to change without notice.

Authors: Steve Christey, Conor Harris, Bill Heinbockel
URL: http://cwe.mitre.org/documents/vulnerability_theory/intro.html

Table of Contents
Table of Contents
Introduction
Introduction

Despite the rapid growth of applied vulnerability research and secure software development, these communities have not made much progress in formalizing their techniques, and the "researcher's instinct" can be difficult to describe or teach to non-practitioners. The discipline is much more black magic than science. For example, terminology is woefully garbled and inadequate, and innovations can be missed because they are misdiagnosed as a common issue.

MITRE has been developing Vulnerability Theory, which is a vocabulary and framework for discussing and analyzing vulnerabilities at an abstract level, but with more substance and precision than heavily-abused, vague concepts such as "input validation" and "denial of service." Our goal is to improve the research, modeling, and classification of software flaws, and to help bring the discipline out of the dark ages. Our hope is that this presentation will generate significant discussion with the most forward-thinking researchers, educate non-expert researchers, and make people think about vulnerabilities differently.

Status
Status

Earlier versions of this document were privately distributed or reviewed with positive feedback, but the topic was not accepted for presentation at some of the major vulnerability research conferences. MITRE is publishing this as a working draft, since we are using these concepts in our CWE work, as well as CVE.

This work has been germinating for about 2 years; early versions are documented in PLOVER (http://cve.mitre.org/docs/plover), which still contains some relevant thoughts on research. The work is evolving mostly in the Common Weakness Enumeration (CWE, http://cwe.mitre.org), a classification of 600+ types of weaknesses that lead to vulnerabilities such as buffer overflows, XSS, insufficient randomness, and bad permissions. The theory is being used to explain classification problems and train non-expert analysts about the vulnerability researcher mind-set. Some terminology has made its way into CVE, and it has helped us to understand new vulnerability variants - especially by giving a vocabulary for efficiently explaining *why* the variants are new.

This draft is the most extensive text we have on the topic. We also have a large number of PowerPoint slides that provide more detail; these are likely to be released on this site in the upcoming months.

One of our biggest challenges is how to effectively summarize the ideas and implications, especially to people who don't do vulnerability research on a daily basis.

We also want to apply this against a broader set of examples. While the theory seems to accommodate design and implementation, concepts such as manipulations have (so far) mostly been explored as they relate to injection-type attacks. In CWE, we plan to produce a cross-section of 50-70 weakness types, which is more scalable than the nearly-600 types that currently exist.

Intended Audience
Intended Audience

This is for technically proficient vulnerability researchers and secure software development practitioners with a broad background in the area. The audience is assumed to be knowledgeable about a broad range of vulnerability types, their associated "manipulations," and the common countermeasures that fail to protect against vulnerabilities.

This uses terms that are not precisely defined. At the moment, we believe that it's more important to highlight the main ideas. Suggestions are welcome.

Basic Concepts
Basic Concepts

Some general terms are not defined. Most concepts will be given as examples.

Base Definitions

PRODUCT - a software package, protocol design, architecture, etc.

FEATURES - the main capabilities offered by the product.

RESOURCES - the entities that are used, modified, or provided by the product, such as memory, CPU, files, network connections, etc.

PROPERTIES - attributes of code, data, or resources that must be transformed or preserved throughout operation of the product.

BEHAVIORS - Actions that the product takes to implement features, or that its users perform.

CONSEQUENCES - improper behaviors with security impact

PROTECTION SCHEMES - behaviors intended to protect against attacks by enforcing or preserving properties of resources.

MANIPULATIONS - Behaviors that affect properties

ACTORS - Entities or components that interact within the product, or in some cases, outside of it.

ROLES - The roles that actors can take within the system

CHANNELS - Resources for sending data or directives

DIRECTIVES - commands, signals, or other interactions from one actor to another, which typically cause the product to perform behaviors.

TOPOLOGY - The interrelationship between actors and roles in a single behavior chain

BOUNDARIES/INTERFACES - the boundaries between multiple actors, components, etc.

DESIGN LIMITATIONS - Properties of code/algorithm that can be used correctly, but can also be used incorrectly

ARTIFACT LABELS - Identifiers for the key locations in code/algorithms that are relevant for a potential vulnerability or attack

PRINCIPLES - the "rules" that govern what we call security

POLICIES - specifications by the product, designer, or user that dictate which properties, behaviors, and resources indicate proper operation of the product.

ASSUMPTIONS - Assumptions that the programmer makes about behaviors

EXPECTATIONS - Assumptions that the program must make about the correctness of behaviors


Behaviors, Properties, and Manipulations

A PRODUCT implements FEATURES by performing certain BEHAVIORS that operate on RESOURCES. The behaviors can be "atomic" or "functional" depending on the level of abstraction being used at the time, such as "read byte from file" or "convert document to a different format." Resources can be "Universal" (such as file, socket, memory, CPU cycles), or "Technology-Specific" (such as cookies, headers, etc.)

The product's behavior performs MANIPULATIONS on RESOURCES by preserving or modifying PROPERTIES. For example, a base64-encoding manipulation is applied to a binary file so that the file can be handled as ASCII; the resulting encoding has the property of being "equivalent" to the original file. See "Property/Manipulation Examples" if the meaning of this paragraph isn't immediately obvious.

The developer of the product has an INTENDED POLICY that dictates appropriate behaviors, resources, and properties. The product itself has an IMPLEMENTED POLICY, which is the code's implementation of the intended policy. Ideally, the products intended and implemented policies are the same; if not, then vulnerabilities or other bugs exist in the product.


Actors, Roles, and Channels

Within the product, ACTORS (products, people, or processes) take on certain ROLES and perform DIRECTIVES that trigger behaviors. These actors communicate across CHANNELS. Example: A "User" process being run by an "Attacker" connects to a Telnet "Service" being run by a "Victim," where the User gives the directive "Log me in." The Victim/Service sends a directive to a DNS server, which is acting as a special "Consultant" in this context, in order to perform a reverse DNS lookup. The DNS server is also controlled by the "Attacker" to return a hostname that violates the property "LENGTH < 100".

Types of Actors include: User, Service, Outsider, Consultant (DNS or RADIUS), Monitor/Observer (intrusion detection, log monitor, debugger), Intermediary (firewall, anti-virus, proxy). Some entities can encompass multiple types of actors, e.g. an intrusion protection system that monitors for attacks and terminates connections if something suspicious occurs.

Roles include: Victim (aka Target), Attacker, Bystander, Accomplice, Conduit.

There are many types of channels, including: Socket, Serial port, Signal (which is an implicit channel), Environment variable (implicit channel), Pipe, etc. "Alternate channels" are not the primary channels, but alternate ways of moving data or directives between actors. For example, the "Shatter" attack uses an alternate channel (the internal Windows messaging system) instead of the GUI.


Vulnerabilities, Faults, and Protection Schemes

A VULNERABILITY in a product allows, or produces, behaviors that (1) fail to preserve intended properties, (2) modify resources so that they have unintended properties, and probably others. Such insecure behaviors are called CONSEQUENCES.

PROTECTION SCHEMES (also called "countermeasures" or "protection mechanisms") are behaviors that are intended to eliminate one or more potential vulnerabilities by protecting against certain attacks, i.e., prevent certain CONSEQUENCES. Examples include: whitelists, blacklists, stack overflow detection, etc. Protection schemes are different than SECURITY FEATURES such as authentication, access control, cryptography, and privilege management. A scheme might be explicit or external, depending on perspective. For example, stack overflow protection is external to the source code; source code that filters an output string against XSS has an explicit scheme.

An ATTACK consists of manipulations of data or behaviors from an actor, where those manipulations have certain properties, in order to perform incorrect operations on a resource, or access other resources.

A DESIGN LIMITATION is a feature or behavior that can theoretically be used correctly, but could lead to a vulnerability if not. For example, the functionality of strcpy() is a design limitation. It can be used securely, but it can introduce vulnerabilities if used incorrectly. This incorrect usage would be an implementation flaw (and perhaps, usage of strcpy at all might be regarded as a design flaw in some circles). Conjecture: all implementation bugs are associated with at least one design limitation.

A PRIMARY FAULT is the first error in the code, after which things start to go wrong. Example: an off-by-one error prevents a null byte from being added to a string.

A RESULTANT FAULT includes behaviors that either add to the problem, or fail to correct it. Example: a PHP program has a primary fault that uses extract($_GET) to overwrite global variables, which can be used for XSS, SQL injection, or file inclusion depending on which variable is being used.

Note: some vulnerabilities probably involve multiple primary faults, but examples have not yet been identified. Many classification and terminology problems occur because one person is looking at a primary fault and another is looking at the resultant. Insufficient diagnosis can often involve resultant faults that obscure more important primary faults.


Types of Properties

Properties can apply to data or behaviors, including code.

VALIDITY - The degree of conformance to data/behavior specifications.
Examples include:

  • GET index.html (no version)
  • non-existent username
  • inconsistent length/payload
  • incorrect sequence of steps

EQUIVALENCE - Whether there is a one-to-many (or many-to-one) mapping between identifiers/streams and their associated resources. Examples include:

  • "../.." == "/" == "%2e%2e/%2e%2e"
  • filename.txt == FileName.txt.
  • $_GET['x'] == $_REQUEST['x']
  • step equivalence: (A->B->C) == (A->C)

MUTABILITY - Whether the resource is expected to be modifiable, by whom, and when.
Examples include:

  • register_globals = 1
  • cookie: auth=1
  • session fixation attack

ACCESSIBILITY - Whether the resource can be accessed, by whom, and when.
Examples include:

  • mutex
  • permissions
  • "Shatter" attack

TRUSTABILITY - Whether the item can be trusted to have certain properties.
Examples include:

  • input validation
  • integrity checking

Properties can be broken down along multiple dimensions: Lexical, Syntactic, Semantic, and Environmental. A manipulation might be syntactically valid but semantically invalid, such as a size specification of "1000" (a well-formed integer) when the product's intended policy only allows the maximum size to be 500.


Types of Manipulations

There are three main manipulation types:

REACHABILITY - required to reach the relevant behavior. Example - when a buffer overflow can only occur in the password field, the reachability manipulation involves providing a login name first.

TRIGGER - modifies the behavior. Examples: long argument, flood of requests.

FACILITATOR - improves control of behaviors or overcomes limitations imposed by product behaviors. Examples: using alphanumeric shellcode to satisfy filtering requirements, "%00" in Perl/PHP filenames to expand the scope of directory traversal to arbitrary file extensions, or the ">" in the beginning of XSS manipulation that closes off the opening tag that the product has already produced in the output.

Manipulations can be characterized in terms of properties. They can be composed or chained.


Examples of Manipulations and Properties
  • Using "../../etc/passwd" might be syntactically equivalent to "/etc/passwd"
  • An "/a/b/c" that is a symlink to /etc/passwd is semantically equivalent.
  • A binary file and its based64-encoded version are semantically equivalent, assuming the encoding is valid.
  • A PHP vulnerability involving register_globals might violate a property that a variable's value should not be mutable by the product's users.
  • A "GET /" without the version is syntactically invalid.
  • A stack-based buffer overflow involving a long input string might be syntactically and semantically valid for the protocol's specification, but it might be semantically invalid (too large) for the product's intended policy. When the stack-based overflow occurs, this modifies adjacent variables and violates the intended non-mutability of the stack. When the shellcode is actually executed after the overflow has occurred, that same input is semantically invalid at the product's level, but semantically valid at the OS level, since it's well-formed.
  • Session fixation attacks introduce non-mutability when mutability is required.
  • Access control can sometimes be bypassed by changing lowercase to uppercase, preserving equivalence.
  • In FTP, doing a "LIST" before a "USER/PASS" is semantically invalid.
  • Making 100 connections to a server might be semantically valid, although the underlying array that tracks the connections might become "syntactically" invalid.

Some more examples of directive manipulations:

  • Skip first step
  • Skip required step
  • Perform steps out of order
  • Perform repeated steps
  • Do not finish step
  • Interrupt step

Examples of manipulations used in "bypass" attacks:

  • Use equivalence to access a desired object that is being protected by its name
  • Use invalid step sequences to directly access a resource instead of going through expected steps
  • Access alternate channel which is assumed to be trusted
Artifact Labels
Artifact Labels

Artifact Labels are used to mark important locations in code, design, or algorithm that are relevant to a vulnerability. Vulnerability researchers frequently highlight these locations when presenting vulnerable code, but they don't use the same terminology, if at all. These labels turn out to be useful in describing certain vulnerability "topology" in the abstract sense, as well.

NOTE: recommendations for alternate terms are welcome.

INTERACTION POINT - a location in the product where "input" (of either data or directives) enters the system. "Injection" might be a more natural term, but it's already overloaded, and it seems to be data-centric. This is equivalent to what others call "entry points," but that term has different uses in binary reverse engineering.

CROSSOVER POINT - the location in which an expected property is violated. Any subsequent behaviors that depend on the expected property could be subject to a vulnerable condition. A crossover point could occur *between* lines of code, e.g. if a product-generated filename is never checked for directory traversal sequences.

TRIGGER POINT - the location in the product where a "fault" occurs, and the product can no longer stop itself from performing incorrect behaviors in the future.

ACTIVATION POINT - the location where the attacker's "payload" becomes activated; presumably the payload involves the incorrect behaviors.

ATTACK VECTOR - a tuple of (PRIMARY FAULT, RESULTANT FAULTS, INTERACTION POINT, CROSSOVER POINT, TRIGGER POINT, ACTIVATION POINT). Different attacks could have different trigger and activation points; for example, a buffer overflow intended for DoS would have a different activation point than one intended for code execution.


Discussion

Terminology note: there might be additional points of interest, especially at a low level. Earlier versions of vuln theory used "control transfer" instead of "trigger," but this term was frequently interpreted incorrectly.

Note: crossover, trigger, and activation points can be very close together - or very far apart. The introduction of crossover points only occurred in mid-February 2007, partially to accommodate subtleties in buffer overflow issues. So, this is still being clarified. However, some manually generated code examples illustrate that there can be some distance between all three points. For example, a crossover point might involve setting a variable that is not immediately used.

The crossover, trigger, and activation points can appear in different locations - different functions, components, or processes.


Examples

Typical Buffer Overflow - the interaction point might be from a read(). The trigger point might be in a bad strcpy. The activation point would be when the affected function actually exits (or if the buffer's freed).

Second-Order SQL Injection - the trigger point occurs when the bad SQL is actually injected into the database; the activation point is when, for example, the administrator access the program that winds up executing the SQL.

OS Command Injection (metacharacters) - trigger and activation point might occur in the same line of code, e.g. when variables are interpolated into an argument string to system(), but they are distinct behaviors at a lower level (the interpolation that creates the string, then the system call).


Code Example
1  ____   print HTTPresponseHeader;
2 ____ print "<title>Hello World</title>";
3 ____ ftype = HTTP_Query_Param("type");
4 ____ str = "/tmp";
5 ____ strcat(str, ftype); strcat(str, ".dat");
6 ____ handle = fileOpen(str, ReadMode);
7 ____ while((line=readFile(handle)))
8 ____ {
9 ____ line=stripTags(line, "script");
10 ____ print line;
11 ____ print "<br>\n";
12 ____ }
13 ____ close(handle);

XSS:

Interaction: 3, 6
Intermediate Fault: 9
Crossover: between 9 and 10
Trigger: 10
Activation: outside of program (when victim views page)

Traversal:

Interaction: 3
Crossover: between 5 and 6
Trigger: 6
Activation: 7 (or 10, depending on attack)

Overflow:

Interaction: 3
Crossover: between 4 and 5
Trigger: 5
Activation: 5 (if DoS intended), outside code (if code execution)

Other Notes:

Line 6: a separate channel must have been needed to inject the XSS into the targeted file.

Line 9 is a weak protection scheme for XSS (incomplete blacklist), with an associated design limitation (blacklists aren't very effective).


Other Examples

Static Code Injection (CWE-96)

By the nature of static code injection vulns, the trigger point is usually in a different executable than the activation point, and the exploit is multi-step.

Example: CVE-2007-0115

injection: login.php

trigger: login.php, writing an error message into security.log.php

activation: viewlog.php is accessed, which includes security.log.php and activates the payload

Additional Concepts Under Exploration
Additional Concepts Under Exploration

The following concepts are probably important, but they haven't been explored as fully as the others.

ASSUMPTIONS/EXPECTATIONS - The assumptions that a program, design, or API makes regarding the properties of behaviors and resources Possibly atomic or functional. The human developer makes certain assumptions; when code executes, it has certain expectations (or enforces them) about its environment, data, etc.

POLICY - an explicit security policy could potentially be specified using the vulnerability theory concepts, e.g. "Actor X can only manipulate properties P1 and P2 for data D." The "vulnerability" concept itself has, at least, an implicit security policy. Any product has an implemented policy; the developer has an intended policy for the product.

DEGREES OF CONTROL - how much control an actor has over a resource; this is frequently described in terms of attacker control over data, directives, and consequences. An actor might have Full, Partial, or No control over a resource - and this could change over time.

PRINCIPLES - everyone's definition of "vulnerability" differs, but it could be defined in terms of certain principles, such as "users should not have access to any resource that is not explicitly granted or implied." Defining these principles could become the basis of a Universal Policy.

BOUNDARIES/INTERFACES - Many (but probably not all) issues occur at boundaries or interfaces between two different entities. Boundaries *might* include representation, data, process/module, actor, etc. Representation is probably essential for adequately explaining major vulnerability phyla such as "injection".

OUTPUTS - Wing et al. explicitly model "exit points" as places where data exits a system; web application security people talk a lot about "output validation." This notion is useful when examining a system/actor in isolation, but a framework that can cover all aspects of attacks/vulnerabilities might only need to model "input." Information leaks can be thought of as output, but it's only a leak if it's an input to an attacker.

CONTAINERS/SANDBOXES - these seem to apply mostly to files and code, but thinking of vulnerabilities as they relate to containers has sometimes been useful. For example, directory traversal and some Java sandbox escaping works by using syntactically valid manipulations that produce references to resources outside the container, which are semantically invalid relative to the intended policy. PHP file inclusion can be thought of as a violation of an intended container that uses semantically invalid manipulations.

PRIVILEGES/PERMISSIONS - surprisingly, these haven't been a factor yet.

"EXPLICIT" vs. "EMERGENT" - behaviors, properties, and resources can either be explicit or emergent. For example, a covert channel can be an emergent resource that wasn't originally intended.

"ATOMIC" vs. "FUNCTIONAL" - these levels apply to most vulnerability theory concepts, but their wholesale introduction might make it more complex than necessary.

The relationship between design limitations and implementation errors needs more study. Protection schemes seem to have unique characteristics in comparison to typical vulnerabilities, and it might be useful to distinguish between "missing" versus "incorrect" protection schemes.

Examples of the Terminology in Action
Examples of the Terminology in Action

The following items don't contain any revelations that would be surprising for expert researchers. The point is to demonstrate the efficiency of the vocabulary.

1)

Any protection scheme that relies on names or identifiers should defend against equivalence manipulations. A product, language, or environment should attempt to minimize the number and types of equivalence manipulations.

2)

Fuzzers are very good at finding secondary faults and consequences, but unless the fuzzing is structured, there is no indication of the primary fault. Diagnosis can be made more difficult if the trigger and activation points are not close by. Unstructured fuzzing will often fail due to the lack of reachability manipulations, e.g. if a parser requires certain semantic consistency between data elements, before a vulnerable behavior can even be invoked.

3)

Black box testing is likely to fail if the techniques do not consider whether an activation point might be in another process or channel.

4)

Monitors and Intermediaries are especially subject to equivalence and validity manipulations, since the receiving/target hosts might have alternate interpretations. Example: a web app firewall might allow invalid HTML through, even though the victim's browser converts that HTML into "valid" HTML. An intermediary might choose to act on header X, when its semantically equivalent header Y is what's actually processed by the client.

5)

When protection schemes are involved, a manipulation might originally be syntactically invalid before the scheme, but then syntactically/semantically valid after the scheme. Example: "....//" in directory traversal is syntactically invalid, until a bad filtering scheme collapses the string into "../". Double-decoding issues are similar.

6)

XSS and buffer overflows can share certain characteristics, such as the mixture of data and directives. In the overflow case, though, this mixture occurs at a level below the programming language; for XSS, it's at a level above the programming language.

Example Applications of the Theory
Example Applications of the Theory

There are several potential applications, only two of which are covered here.

1)

Gap analysis and finding new vulnerability classes

By moving up a level of abstraction from classes like buffer overflows, XSS, and privilege management errors, we might be able to use the framework to describe new issues in vuln theory terms, then look at other known instances that share similar characteristics. This would help identify gaps in understanding (or current researcher focus), and possibly lead to discovery of new vulnerability classes, or at least variants. Example: Product class X has behaviors B1 and B2, with manipulations M1...M5 on resource R. These manipulations preserve property P and modify property Q. What types of vulnerabilities or attacks involve P and Q, and therefore might be able to affect X?"

2)

Evaluating vulnerability "difficulty"

Since we expect products to always have vulnerabilities, we hope that they only have the most difficult-to-find, difficult-to-exploit vulnerabilities. Concepts such as artifact labels could be used to calculate the "distance" from input to exploit; novelty and complexity of manipulations could be evaluated more cleanly; actors and channels could be used to describe a "topology" ("vulnerability surface"); and protection schemes could be assessed in terms of the properties they preserve.

For example, if a vulnerability has the interaction, trigger, and activation points all in the same function, that's probably a more obvious vulnerability than something that involves multiple actors, channels, and manipulations.

Related Work
Related Work

Dowd et al's "Art of Software Security Assessment" touches on some of these concepts in an introductory chapter, but it does not propose them as a formal framework. Our work is more detailed in this respect.

The work from Jeannette Wing et al on measuring attack surface introduces some concepts that overlap vulnerability theory, but it is largely for data-driven attack vectors and is focused on quantitative measurements of design quality.

The Trike threat modeling framework has similar concepts.

Informal conversations with Matt Bishop of UC Davis suggest some overlap with their current work.

One early reviewer suggested that dataflow diagramming has some utility, and using that terminology where appropriate might be useful in educating non-security practitioners.

As of July 2007, the most novel elements of vulnerability theory include properties and behaviors.

Changelog
Changelog
0.5 - July 9, 2007
    - extended definitions
	- more examples of specific concepts
    - prepared for public release

0.4 - April 7, 2007
    - reasons lost

0.3 - Feb 26, 2007
    - added crossover points, related work

0.2 - Feb 14, 2007
    - added minor points based on feedback

0.1 - Jan 31, 2007
    - first version
Page Last Updated: September 11, 2007