CWE

Common Weakness Enumeration

A Community-Developed Dictionary of Software Weakness Types

CWE/SANS Top 25 Most Dangerous Software Errors Common Weakness Scoring System
Common Weakness Risk Analysis Framework
Home > CWSS Common Weakness Scoring System (CWSS)  

Common Weakness Scoring System (CWSS™)

The MITRE Corporation
Copyright © 2011
http://cwe.mitre.org/cwss/

CWSS version: 0.8

Document version: 0.8

Revision Date: June 27, 2011

Project Coordinator:

Bob Martin (MITRE)

Document Editor:

Steve Christey (MITRE)

Introduction
Introduction

When a security analysis of a software application is performed, such as when using an automated code auditing tool, developers often face hundreds or thousands of individual bug reports for weaknesses that are discovered in their code. In certain circumstances, a software weakness can lead to an exploitable vulnerability. For example, a buffer overflow vulnerability might arise from a weakness in which the programmer does not properly validate the length of an input buffer. This weakness only contributes to a vulnerability if the input can be influenced by a malicious party, and if that malicious input can copied to an output buffer that is smaller than the input.

Due to the high volume of reported weaknesses, developers are forced into a situation in which they must prioritize which issues they should investigate and fix first. Similarly, when assessing design and architecture choices and their weaknesses, there needs to be a method for prioritizing them relative to each other and with the other issues of the application. Finally, software consumers want to know what they should worry about the most, and what to ask for to get a more secure product from their vendors and suppliers.

Further complicating the problem, the importance of a weakness may vary depending on business or mission needs, the kinds of technologies in use, and the threat environment.

In short, people need to be able to reason and communicate about the relative importance of different weaknesses. While various scoring methods are used today, they are either ad hoc or inappropriate for application to the still-imprecise evaluation of software security.

The Common Weakness Scoring System (CWSS) provides a mechanism for scoring weaknesses in a consistent, flexible, open manner while accommodating context for the various business domains. It is a collaborative, community-based effort that is addressing the needs of its stakeholders across government, academia, and industry. CWSS is a part of the Common Weakness Enumeration (CWE) project, co-sponsored by the Software Assurance program in the office of Cybersecurity and Communications of the U.S. Department of Homeland Security (DHS).

CWSS:

  • provides a common framework for prioritizing security errors ("weaknesses") that are discovered in software applications
  • provides a quantitative measurement of the unfixed weaknesses that are present within a software application
  • can be used by developers to prioritize unfixed weaknesses within their own software
  • in conjunction with the Common Weakness Risk Analysis Framework (CWRAF), can be used by consumers to identify the most important weaknesses for their business domains, in order to inform their acquisition and protection activities as one part of the larger process of achieving software assurance.

Table of Contents
Table of Contents

Stakeholders
Stakeholders

To be most effective, CWSS supports multiple usage scenarios by different stakeholders who all have an interest in a consistent scoring system for prioritizing software weaknesses that could introduce risks to products, systems, networks and services. Some of the primary stakeholders are listed below.

StakeholderDescription
Software developers often operate within limited time frames, due to release cycles and limited resources. As a result, they are unable to investigate and fix every reported weakness. They may choose to concentrate on the worst problems, the easiest-to-fix. In the case of automatic weakness findings, they might choose to focus on the findings that are least likely to be false positives.

Software development managers

create strategies for prioritizing and removing entire classes of weaknesses from the entire code base, or at least the portion that is deemed to be most at risk, by defining custom "Top-N" lists. They must understand the security implications of integrating third-party software, which may contain its own weaknesses. They may need to support distinct security requirements and prioritization for each product line.

Software acquirers

want to obtain third-party software with a reasonable level of assurance that the software provider has performed due diligence in removing or avoiding weaknesses that are most critical to the acquirer's business and mission. Related stakeholders include CIOs, CSOs, system administrators, and end users of the software.

Code analysis vendors and consultants

want to provide a consistent, community-vetted scoring mechanism for different customers.

Evaluators of code analysis capabilities

evaluate the capabilities of code analysis techniques (e.g., NIST SAMATE). They could use a consistent weakness scoring mechanism to support sampling of reported findings, as well as understanding the severity of these findings without depending on ad hoc scoring methods that may vary widely by tool/technique.

Other stakeholders may include vulnerability researchers, advocates of secure development, and compliance-based analysts (e.g., PCI DSS).

CWSS Design Considerations
CWSS Design Considerations

For CWSS to be most effective to its stakeholders, several aspects of the problem area must be considered when designing the framework and metrics. Some of these considerations might not be resolved until several revisions of CWSS have been released and tested.

  • CWSS scoring will need to account for incomplete information throughout much of the lifecycle of a reported weakness. For example, scoring may be necessary before the weakness is even known to contribute to a vulnerability, e.g. in the initial output from an automated code scanning tool. Second, the entity (human or machine) assigning the initial CWSS score might not have complete information available, e.g. the expected operating environment. Finally, some factors in the CWSS score might rely on trend information (such as frequency of occurrence) that is only estimated due to lack of sufficient statistical data. For example, the 2010 CWE Top 25 relied on survey results, because very few sources had prevalence data at the same level of detail as the weaknesses being considered for the list. Incomplete information is a challenge for CVSS scoring, and it is expected to be even more important for CWSS.
  • It is assumed that portions of CWSS scores can be automatically generated. For example, some factors may be dependent on the type of weakness being scored; potentially, the resulting subscores could be derived from CWE data. As another example, a web script might only be accessible by an administrator, so all weaknesses may be interpreted in light of this required privilege.
  • CWSS should be scalable. Some usage scenarios may require the scoring of thousands of weaknesses, such as defect reports from an automated code scanning tool. When such a high volume is encountered, there are too many issues to analyze manually. As a result, automated scoring must be supported.
  • The potential CWSS stakeholders, their needs, and associated use cases should be analyzed to understand their individual requirements. This might require support for multiple scoring techniques or methods.
  • Associated metrics must balance usability with completeness, i.e., they cannot be too complex.
  • Environmental conditions and business/mission priorities should impact how scores are generated and interpreted.
  • CWSS should be automatable and flexible wherever possible, but support human input as well.

Scoring Methods within CWSS
Scoring Methods within CWSS

The stakeholder community is collaborating with MITRE to investigate several different scoring methods that might need to be supported within the CWSS framework.

MethodNotes
Targeted

Score individual weaknesses that are discovered in the design or implementation of a specific ("targeted") software package, e.g. a buffer overflow in the username of an authentication routine in line 1234 of vuln.c in an FTP server package. Automated tools and software security consultants use targeted methods when evaluating the security of a software package in terms of the weaknesses that are contained within the package.

Generalized

Score classes of weaknesses independent of any particular software package, in order to prioritize them relative to each other (e.g. "buffer overflows are higher priority than memory leaks"). This approach is used by the CWE/SANS Top 25, OWASP Top Ten, and similar efforts, but also by some automated code scanners. The generalized scores could vary significantly from the targeted scores that would result from a full analysis of the individual occurrences of the weakness class within a specific software package. For example, while the class of buffer overflows remains very important to many developers, individual buffer overflow bugs might be considered less important if they cannot be directly triggered by an attacker and their impact is reduced due to OS-level protection mechanisms such as ASLR.

Context-adjusted

Modify scores in accordance with the needs of a specific analytical context that may integrate business/mission priorities, threat environments, risk tolerance, etc. These needs are captured using vignettes that link inherent characteristics of weaknesses with higher-level business considerations. This method could be applied to both targeted and generalized scoring.

Aggregated

Combine the results of multiple, lower-level weakness scores to produce a single, overall score (or "grade"). While aggregation might be most applicable to the targeted method, it could also be used in generalized scoring, as occurred in the 2010 CWE/SANS Top 25.

The current focus for CWSS is on the Targeted scoring method and a framework for context-adjusted scoring. Methods for aggregated scoring will follow. Generalized scoring is being developed separately, primarily as part of the 2011 Top 25 and CWRAF.

CWSS 0.6 Scoring for Targeted Software
CWSS 0.6 Scoring for Targeted Software

In CWSS 0.6, the score for each reported weakness (finding) is calculated using 18 different factors in three metrics groups.

Scoring

In CWSS 0.6, the score for a weakness, or a weakness bug report ("finding") is calculated using 18 different factors, across three metric groups:

  • the Base Finding group, which captures the inherent risk of the weakness, confidence in the accuracy of the finding, and strength of controls.
  • the Attack Surface group, which captures the barriers that an attacker must cross in order to exploit the weakness.
  • the Environmental group, which includes factors that may be specific to a particular operational context, such as business impact, likelihood of exploit, and existence of external controls.

CWSS Metric Groups

(A larger picture is available.)

CWSS Metric Groups

CWSS can be used in cases where there is little information at first, but the quality of information can improve over time. It is anticipated that in many use-cases, the CWSS score for an individual weakness finding may change frequently, as more information is discovered. Different entities may evaluate separate factors at different points in time.

As such, every CWSS factor effectively has "environmental" or "temporal" characteristics, so it is not particularly useful to adopt the same types of metric groups as are used in CVSS.

Metric GroupFactors
Base Finding Group

* Technical Impact (TI)

* Acquired Privilege (AP)

* Acquired Privilege Layer (AL)

* Internal Control Effectiveness (IC)

* Finding Confidence (FC)

Attack Surface Group

* Required Privilege (RP)

* Required Privilege Layer (RL)

* Access Vector (AV)

* Authentication Strength (AS)

* Authentication Instances (AI)

* Level of Interaction (IN)

* Deployment Scope (SC)

Environmental Group

* Business Impact (BI)

* Likelihood of Discovery (DI)

* Likelihood of Exploit (EX)

* External Control Effectiveness (EC)

* Remediation Effort (RE)

* Prevalence (P)

Supporting Uncertainty and Flexibility Within Factors
Supporting Uncertainty and Flexibility Within Factors

Most factors have three values in common:

ValueUsage
Unknown

The entity calculating the score does not have enough information to provide a value for the factor. This can be a signal for further investigation. For example, an automated code scanner might be able to find certain weaknesses, but be unable to detect whether any authentication mechanisms are in place. The use of "Unknown" emphasizes that the score is incomplete or estimated, and further analysis may be necessary. This makes it easier to model incomplete information, and for the Business Value Context to influence final scores that were generated using incomplete information. The weight for this value is 0.5 for all factors, which generally produces a lower score; the addition of new information (i.e., changing some factors from "Unknown" to another value) will then adjust the score upward or downward based on the new information.

Not Applicable

The factor is being explicitly ignored in the score calculation. This effectively allows the Business Value Context to dictate whether a factor is relevant to the final score. For example, a customer-focused CWSS scoring method might ignore the remediation effort, and a high-assurance environment might require investigation of all reported findings, even if there is low confidence in their accuracy. For a set of weakness findings for an individual software package, it is expected that all findings would have the same "Not Applicable" value for the factor that is being ignored.

Quantified

The factor can be weighted using a quantified, continuous range of 0.0 through 1.0, instead of the factor's defined set of discrete values. Not all factors are quantifiable in this way, but it allows for additional customization of the metric.

Default

The factor's weight can be set to a default value. Labeling the factor as a default allows for investigation and possible modification at a later time.

Base Finding Metric Group
Base Finding Metric Group

The Base Finding metric group consists of the following factors:

  • Technical Impact (TI)
  • Acquired Privilege (AP)
  • Acquired Privilege Layer (AL)
  • Internal Control Effectiveness (IC)
  • Finding Confidence (FC)

The combination of values from Technical Impact, Acquired Privilege, and Acquired Privilege Layer gives the user some expressive power. For example, the user can characterize "High" Technical Impact with "Administrator" privilege at the "Application" layer.

Technical Impact (TI)

Technical Impact is the potential result that can be produced by the weakness, assuming that the weakness can be successfully reached and exploited. This is expressed in terms that are more fine-grained than confidentiality, integrity, and availability.

ValueCodeWeightDescription
Critical C 1.0 complete control over the software, the data it processes, and the environment in which it runs (e.g. the host system), to the point where operations cannot take place.
High H 0.9
Medium M 0.6
Low L 0.3
None N 0.0
Default D 0.6 The Default weight is the median of the weights for Critical, High, Medium, Low, and None.
Unknown Unk 0.5
Not Applicable NA 1.0 This factor might not be applicable in an environment with high assurance requirements; the user might want to investigate every weakness finding of interest, regardless of confidence.
Quantified Q This factor could be quantified with custom weights.

If this set of values is not precise enough, CWSS users can use their own Quantified methods to derive a subscore. One such method involves using the Common Weakness Risk Analysis Framework (CWRAF) to define a vignette and a Technical Impact Scorecard. The Impact weight is calculated using vignette-specific Importance ratings for different technical impacts that could arise from exploitation of the weakness, such as modification of sensitive data, gaining privileges, resource consumption, etc.

Acquired Privilege (AP)

The Acquired Privilege identifies the type of privileges that are obtained by an entity who can successfully exploit the weakness. In some cases, the acquired privileges may be the same as the required privileges, which implies either (1) "horizontal" privilege escalation (e.g. from one unprivileged user to another), or (2) privilege escalation within a sandbox, such as an FTP-only user who can escape to the shell.

Notice that the values are the same as those for Required Privilege, but the weights are different.

The aconym "RUGNAP" can serve as a mnemonic for remembering the key values ("Regular User", "Guest", "None", "Admin", "Partially-Privileged").

ValueCodeWeightDescription
Administrator A 1.0 The entity has administrator, root, SYSTEM, or equivalent privileges that imply full control over the software or the underlying OS.
Partially-Privileged User P 0.9 The entity is a valid user with some special privileges, but not enough privileges that are equivalent to an administrator. For example, a user might have privileges to make backups, but not to modify the software's configuration or install updates.
Regular User RU 0.7 The entity is a regular user who has no special privileges.
Guest G 0.6 The entity acquires limited or "guest" privileges that can significantly restrict allowable activities. This could happen in an environment that uses strong privilege separation.
None N 0.1 No extra privileges are acquired.
Default D 0.7 Median of the weights for None, Guest, Regular User, Partially-Privileged User, and Administrator.
Unknown Unk 1.0 0.5
Not Applicable NA 1.0 This factor might not be applicable in an environment with high assurance requirements that wants strict enforcement of privilege separation, even between already-privileged users.

Note that this factor can not be quantified.

Acquired Privilege Layer (AL)

The Acquired Privilege Layer identifies the operational layer to which the entity gains access if the weakness can be successfully exploited.

A mnemonic for this factor is "SANE" (System, Application, Network, Enterprise).

ValueCodeWeightDescription
Application A 1.0

The entity must be able to have access to an affected application.

System S 0.9

The entity must have access to, or control of, a system or physical host.

Network N 0.7

The entity must have access to/from the network.

Enterprise E 1.0

The entity must have access to a critical piece of enterprise infrastructure, such as a router, switch, DNS, domain controller, firewall, identity server, etc.

Default D 0.9 Median of the weights for SANE.
Unknown Unk 0.5
Not Applicable NA 1.0 This factor might not be applicable in an environment with high assurance requirements that wants strict enforcement of privilege separation, even between already-privileged users.

Note that this factor can not be quantified.

Internal Control Effectiveness (IC)

An Internal Control is a control, protection mechanism, or mitigation that has been explicitly built into the software (whether through architecture, design, or implementation). Internal Control Effectiveness measures the ability of the control to render the weakness unable to be exploited by an attacker. For example, an input validation routine that restricts input length to 15 characters might be moderately effective against XSS attacks by reducing the size of the XSS exploit that can be attempted.

ValueCodeWeightDescription
None N 1.0 No controls exist.
Limited L 0.9 There are simplistic methods or accidental restrictions that might prevent a casual attacker from exploiting the issue.
Moderate M 0.7 The protection mechanism is commonly used but has known limitations that might be bypassed with some effort by a knowledgeable attacker. For example, the use of HTML entity encoding to prevent XSS attacks may be bypassed when the output is placed into another context such as a Cascading Style Sheet or HTML tag attribute.
Indirect (Defense-in-Depth) I 0.5 The control does not specifically protect against exploitation of the weakness, but it indirectly reduces the impact when a successful attack is launched, or otherwise makes it more difficult to construct a functional exploit. For example, a validation routine might indirectly limit the size of an input, which might make it difficult for an attacker to construct a payload for an XSS or SQL injection attack.
Best-Available B 0.3 The control follows best current practices, although it may have some limitations that can be overcome by a skilled, determined attacker, possibly requiring the presence of other weaknesses. For example, the double-submit method for CSRF protection is considered one of the strongest available, but it can be defeated in conjunction with behaviors of certain functionality that can read raw HTTP headers.
Complete C 0.0 The control is completely effective against the weakness, i.e., there is no bug or vulnerability, and no adverse consequence of exploiting the issue. For example, a buffer copy operation that ensures that the destination buffer is always larger than the source (plus any indirect expansion of the original source size) will not cause an overflow.
Default D 0.6 Median of the weights for Complete, Best-Available, Indirect, Moderate, Limited, and None.
Unknown Unk 0.5
Not Applicable NA 1.0

Note that this factor can not be quantified.

Finding Confidence (FC)

Finding Confidence is the confidence that the reported issue:

  • (1) is a weakness, and
  • (2) can be triggered or utilized by an attacker.
ValueCodeWeightDescription
Proven True T 1.0 the weakness is reachable by the attacker.
Proven Locally True LT 0.8

the weakness occurs within an individual function or component whose design relies on safe invocation of that function, but attacker reachability to that function is unknown or not present. For example, a utility function might construct a database query without encoding its inputs, but if it is only called with constant strings, the finding is locally true.

Proven False F 0.0

the finding is erroneous (i.e. the finding is a false positive and there is no weakness), and/or there is no possible attacker role.

Default D 0.8 Median of the weights for Proven True, Proven Locally True, and Proven False.
Unknown Unk 0.5
Not Applicable NA 1.0 This factor might not be applicable in an environment with high assurance requirements; the user might want to investigate every weakness finding of interest, regardless of confidence.
Quantified Q This factor could be quantified with custom weights. Some code analysis tools have precise measurements of the accuracy of specific detection patterns.

Attack Surface Metric Group
Attack Surface Metric Group

The Attack Surface metric group consistes of the following factors:

  • Required Privilege (RP)
  • Required Privilege Layer (RL)
  • Access Vector (AV)
  • Authentication Strength (AS)
  • Authentication Instances (AI)
  • Level of Interaction (IN)
  • Deployment Scope (SC)

Required Privilege (RP)

The Required Privilege identifies the type of privileges required for an entity to reach the code/functionality that contains the weakness.

The aconym "RUGNAP" can serve as a mnemonic for remembering the key values ("Regular User", "Guest", "None", "Admin", "Partially-Privileged").

ValueCodeWeightDescription
None N 1.0 No privileges are required. For example, a web-based search engine may not require any privileges for an entity to enter a search term and view results.
Guest G 0.9 The entity has limited or "guest" privileges that can significantly restrict allowed activities; the entity might be able to register or create a new account without any special requirements or proof of identity. For example, a web blog might allow participants to create a user name and submit a valid email address before entering comments.
Regular User RU 0.7 The entity is a regular user who has no special privileges.
Partially-Privileged User P 0.6 The entity is a valid user with some special privileges, but not enough privileges that are equivalent to an administrator. For example, a user might have privileges to make backups, but not to modify the software's configuration or install updates.
Administrator A 0.1 The entity has administrator, root, SYSTEM, or equivalent privileges that imply full control over the software or the underlying OS.
Default D 0.7 Median of weights for RUGNAP values.
Unknown Unk 0.5
Not Applicable NA 1.0 This factor might not be applicable in an environment with high assurance requirements that wants strict enforcement of privilege separation, even between already-privileged users.

Note that this factor can not be quantified.

Required Privilege Layer (RL)

The Required Privilege Layer identifies the operational layer at which the entity must have access if the weakness can be successfully exploited.

A mnemonic for this factor is "SANE" (System, Application, Network, Enterprise).

ValueCodeWeightDescription
System S 0.9

The entity must have access to, or control of, a system or physical host.

Application A 1.0

The entity must be able to have access to an affected application.

Network N 0.7

The entity must have access to/from the network.

Enterprise E 1.0

The entity must have access to a critical piece of enterprise infrastructure, such as a router, switch, DNS, domain controller, firewall, identity server, etc.

Default D 0.9 Median of weights for SANE values.
Unknown Unk 0.5
Not Applicable NA 1.0 This factor might not be applicable in an environment with high assurance requirements that wants strict enforcement of privilege separation, even between already-privileged users.

Note that this factor can not be quantified.

Access Vector (AV)

The Access Vector identifies the channel through which an entity must communicate to reach the code or functionality that contains the weakness. Note that these values are very similar to the ones used in CVSS, except CWSS distinguishes between physical access and local (shell/account) access.

While there is a close relationship between Access Vector and Required Privilege Layer, the two are distinct. For example, an attacker with "physical" access to a router might be able to affect the Network or Enterprise layer.

ValueCodeWeightDescription
Internet I 1.0 an attacker must have access to the Internet to reach the weakness.
Intranet R 0.8 an attacker must have access to an enterprise intranet that is shielded from direct access from the Internet, e.g. by using a firewall, but otherwise the intranet is accessible to most members of the enterprise.
Private Network V 0.8 an attacker must have access to a private network that is only accessible to a narrowly-defined set of trusted parties.
Adjacent Network A 0.7

An attacker must have access to a physical interface to the network, such as the broadcast or collision domain of the vulnerable software. Examples of local networks include local IP subnet, Bluetooth, IEEE 802.11, and local Ethernet segment.

Local L 0.5 The attacker must have an interactive, local (shell) account that interfaces directly with the underlying operating system.
Physical P 0.2 The attacker must have physical access to the system that the software runs on, or otherwise able to interact with the system via interfaces such as USB, CD, keyboard, mouse, etc.
Default D 0.75 Median of weights for relevant values.
Unknown U 0.5
Not Applicable NA 1.0

Note that this factor can not be quantified.

Authentication Strength (AS)

The Authentication Strength covers the strength of the authentication routine that protects the code/functionality that contains the weakness.

The values for this factor still need to be defined more clearly. It might be reasonable to adopt approaches such as the four levels outlined in NIST Special Publication 800-63 ("Electronic Authentication Guideline") and OMB Memo 04-04. However, since the strength of an authentication mechanism may degrade over time due to advances in attack techniques or computing power, it might be useful to select values that are effectively "future-proof." (On the other hand, this might make it difficult to compare CWSS scores if they were assigned at different times.)

ValueCodeWeightDescription
Strong S 0.7 The weakness requires strongest-available methods to tie the entity to a real-world identity, such as hardward-based tokens, and/or multi-factor authentication.
Moderate M 0.8 The weakness requires authentication using moderately strong methods, such as the use of certificates from untrusted authorities, knowledge-based authentication, or one-time passwords.
Weak W 0.9 The weakness requires a simple, weak authentication method that is easily compromised using spoofing, dictionary, or replay attacks, such as a static password.
None N 1.0 The weakness does not require any authentication at all.
Default D 0.85 Median of values for Strong, Moderate, Weak, and None.
Unknown Unk 0.5
Not Applicable NA 1.0 This might not be applicable in an environment with high assurance requirements that seek to eliminate all weaknesses.

Note that this factor can not be quantified.

Authentication Instances (AI)

Authentication Instances covers the number of distinct instances of authentication that an entity must perform to reach the weakness.

ValueCodeWeightDescription
None N 1.0 No authentication is required.
Single S 0.8 A single instance of authentication is required.
Multiple M 0.5 Multiple instances of authentication are required.
Default D 0.8 Median weight of values for None, Single, and Multiple.
Unknown Unk 0.5
Not Applicable NA 1.0 This might not be applicable in an environment with high assurance requirements.

Note that this factor can not be quantified.

Level of Interaction (IN)

The Level of Interaction covers the actions that are required by the human victim(s) to enable a successful attack to take place.

ValueCodeWeightDescription
Automated Aut 1.0 No human interaction is required.
Limited/Typical Ltd 0.9 The attacker must convince the user to perform an action that is common or regarded as "normal" within typical product operation. For example, clicking on a link in a web page, or previewing the body of an email, is common behavior.
Moderate Mod 0.8 The attacker must convince the user to perform an action that might appear suspicious to a cautious, knowledgeable user. For example: the user has to accept a warning that suggests the attacker's payload might contain dangerous content.
Opportunistic Opp 0.3 The attacker cannot directly control or influence the victim, and can only passively capitalize on mistakes or actions of others.
High High 0.1 A large amount of social engineering is required, possibly including ignorance or negligence on the part of the victim.
No interaction NI 0.0 There is no interaction possible, not even opportunistically; this typically would render the weakness as a "bug" instead of leading to a vulnerability. Since CWSS is for security, the weight is 0.
Default D 0.55 Median of values for Automated, Limited, Moderate, Opportunistic, High, and No interaction.
Unknown Unk 0.5
Not Applicable NA 1.0 This might not be applicable in an environment with high assurance requirements, or an environment that has high concerns about insider attacks between people with an established trust relationship.

Note that this factor can not be quantified.

Deployment Scope (SC)

Deployment Scope identifies whether the weakness is present in all deployable instances of the software, or if it is limited to a subset of platforms and/or configurations. For example, a numeric calculation error might only be applicable for software that is running under a particular OS and a 64-bit architecture, or a path traversal issue might only affect operating systems for which "\" is treated as a directory separator.

The "RAMP" acronym can serve as a mnemonic for the key values (Rare, All, Moderate, Potential).

ValueCodeWeightDescription
All All 1.0 Present in all platforms or configurations
Moderate Mod 0.9

Present in common platforms or configurations

Rare Rare 0.5 Only present in rare platforms or configurations
Potentially Reachable Pot 0.1 Potentially reachable, but all code paths are currently safe, and/or the weakness is in dead code
Default D 0.7 The median of weights for RAMP values.
Unknown Unk 0.5
Not Applicable NA 1.0
Quantified Q This factor could be quantified with custom weights. The user may know what percentage of shipped (or supported) software contains this bug.

Note: this factor was called "Universality" in CWSS 0.2.

Note that "Potentially Reachable" has some overlap with "Locally True" in the Finding Confidence (FC) factor.

Environmental Metric Group
Environmental Metric Group

The Environmental metric group consistes of the following factors:

  • Business Impact (BI)
  • Likelihood of Discovery (DI)
  • Likelihood of Exploit (EX)
  • External Control Effectiveness (EC)
  • Remediation Effort (RE)
  • Prevalence (P)

Business Impact (BI)

Business Impact describes the potential impact to the business or mission if the weakness can be successfully exploited.

Note: since business concerns vary widely across organizations, CWSS 0.6 does not attempt to provide a more precise breakdown, e.g. in terms of financial, reputational, physical, legal, or other types of damage. This factor can be quantified to support any externally-defined models.

ValueCodeWeightDescription
Critical C 1.0 The business/mission could fail.
High H 0.9 The operations of the business/mission would be significantly affected.
Medium M 0.6 The business/mission would be affected, but without extensive damage to regular operations.
Low L 0.3 Minimal impact on the business/mission.
None N 0.0 No impact.
Default D 0.6 The median of weights for Critical, High, Medium, Low, and None.
Unknown Unk 0.5
Not Applicable NA 1.0 This factor might not be applicable in contexts in which the business impact is irrelevant.
Quantified Q This factor could be quantified with custom weights. Some code analysis tools have precise measurements of the accuracy of specific detection patterns.

Likelihood of Discovery (DI)

Likelihood of Discovery is the likelihood that an attacker can discover the weakness.

ValueCodeWeightDescription
High H 1.0 It is very likely that an attacker can discover the weakness quickly and with little effort using simple techniques, without access to source code or other artifacts that simplify weakness detection.
Medium M 0.6 An attacker might be able to discover the weakness, but would require certain skills to do so, possibly requiring source code access or reverse engineering knowledge. It may require some time investment to find the issue.
Low L 0.2 An attacker is unlikely to discover the weakness without highly specialized skills, acccess to source code (or its equivalent), and a large time investment.
Default D 0.6 The median of the High, Medium, and Low values.
Unknown Unk 0.5
Not Applicable NA 1.0 Some BVCs may assume that all weaknesses will be discovered by an attacker.
Quantified Q This factor could be quantified with custom weights.

Note that this factor has been used in some prioritization schemes in the past, such as Discoverability in Microsoft's DREAD classification scheme.

However, it is being considered for removal in future versions of CWSS. There are a few considerations:

  • If a finding is being reported, then it has already been discovered once, which suggests that it could be discovered again.
  • Many threat models assume that a potential attacker has perfect knowledge of the underlying code, i.e., assume a worst-case scenario in which the weakness has already been discovered.
  • Without explicit knowledge of the skills, motivations, and methods of the attackers, the likelihood of discovery can only be estimated, and this value is likely to be different for different kinds of attackers, even within the same vignette.
  • The Likelihood of Discovery is often influenced by many other CWSS factors, such as Prevalence, Access Vector, Deployment Scope, Required Privilege Level, Authentication Instances, Likelihood of Exploit, Level of Interaction, and Control Effectiveness.

Likelihood of Exploit (EX)

Likelihood of Exploit is the likelihood that, if the weakness is discovered, an attacker with the required privileges/authentication/access would be able to successfully exploit it.

ValueCodeWeightDescription
High H 1.0 It is highly likely that an attacker would target this weakness successfully, with a reliable exploit that is easy to develop.
Medium M 0.6 An attacker would probably target this weakness successfully, but the chances of success might vary, or require multiple attempts to succeed.
Low L 0.2 An attacker probably would not target this weakness, or could have very limited chances of success.
None N 0.0 An attacker has no chance of success; i.e., the issue is a "bug" because there is no attacker role, and no benefit to the attacker.
Default D 0.6 Median of the High, Medium, and Low values. The "None" value is ignored with the expectation that few weakness findings would be scored using the value, and including it in the median calculation would reduce the weight to a non-intuitive level.
Unknown Unk 0.5
Not Applicable NA 1.0 This might not be applicable in an environment with high assurance requirements, which could assume that attackers could exploit any weakness they can find, or be willing to invest significant resources to work around any possible barriers to exploit success.
Quantified Q This factor could be quantified with custom weights.

Note that this factor is influenced by the Impact of a weakness, since attackers often target weaknesses that have the most severe impacts. Alternately, they may target weaknesses that are easy to trigger. It is also influenced by other factors such as the effectiveness of internal and external controls.

It might seem that the prevalence is also an influence, but prevalence is more closely related to Likelihood of Discovery.

External Control Effectiveness (EC)

External Control Effectiveness is the capability of controls or mitigations outside of the software that may render the weakness unable to be reached or triggered by an attacker. For example, Address Space Layout Randomization (ASLR) and similar technologies reduce, but do not eliminate, the chances of success in a buffer overflow attack. However, this is not directly instantiated within the software itself.

ValueCodeWeightDescription
None N 1.0 No controls exist.
Limited L 0.9 There are simplistic methods or accidental restrictions that might prevent a casual attacker from exploiting the issue.
Moderate M 0.7 The protection mechanism is commonly used but has known limitations that might be bypassed with some effort by a knowledgeable attacker.
Indirect (Defense-in-Depth) I 0.5 The control does not specifically protect against exploitation of the weakness, but it indirectly reduces the impact when a successful attack is launched, or otherwise makes it more difficult to construct a functional exploit. For example, Address Space Layout Randomization (ASLR) and similar technologies reduce, but do not eliminate, the chances of success in a buffer overflow attack. Since the response is typically to exit the process, the result is still a denial of service.
Best-Available B 0.3 The control follows best current practices, although it may have some limitations that can be overcome by a skilled, determined attacker, possibly requiring the presence of other weaknesses. For example, Transport Layer Security (TLS) / SSL 3 are in operation throughout much of the Web, and stronger methods generally are not available due to compability issues.
Complete C 0.1 The control is completely effective against the weakness, i.e., there is no bug or vulnerability, and no adverse consequence of exploiting the issue. For example, a sandbox environment might restrict file access operations to a single working directory, which would protect against exploitation of path traversal.

A non-zero weight is used to reflect the possibility that the external control could be accidentally removed in the future, e.g. if the software's environment changes.

Default D 0.6 The median of Complete, Best-Available, Indirect, Moderate, Limited, and None.
Unknown Unk 0.5
Not Applicable NA 1.0

Note that this factor can not be quantified.

Remediation Effort (RE)

Remediation Effort (RE) is the amount of effort required to remediate the weakness so that it no longer poses a security risk to the software.

ValueCodeWeightDescription
Extensive E 1.0

Requires significant labor or time to address, possibly requiring modifications to design or architecture; available remediations will otherwise break legitimate functionality; etc.

Moderate M 0.9 Requires a moderate amount of labor or time to address, possibly involving multiple components or source files, without significant impact on design or architecture.
Limited L 0.8 Remediations can be applied with limited labor and time, e.g. by modifying a relatively small number of lines of code.
Default D 0.9 The median weight of Extensive, Moderate, and Limited.
Unknown Unk 0.5
Not Applicable NA 1.0

The BVC might not want to consider how expensive it is for the developer to fix a weakness, such as a developer who must fix certain issues in order to meet compliance, or a customer who is considering acquisition of the software.

Quantified Q This factor could be quantified with custom weights. Remediation Effort could be measured in terms of financial or labor costs, but methods for normalizing the results to a number between 0.0 and 1.0 are unclear.

Note that the proposed weights reflect a bias that weaknesses that are more expensive to fix will have higher scores than the same types of weaknesses that are less expensive to fix.

This factor might be removed from future versions of CWSS. There are a few points of debate:

  • It can be argued that this factor does not directly contribute to the overall risk of an issue.
  • Software consumers might not directly care about this factor, although weaknesses with a high remediation effort might not be fixed quickly. In turn, there may be a large time window in which the consumer will not be able to obtain a patch. This possibility could be integrated into the customer's overall risk analysis, especially during the acquisition cycle. It is not certain whether this concern should be captured in the actual CWSS score, or analyzed independently.
  • For developers, the remediation effort might impact which weaknesses are addressed first, but this might be better handled with separate processes within the context of a remediation plan. For example, bug databases and external maintenance/release processes might be more appropriate.
  • In some contexts, the remediation effort may be irrelevant. For example, regulatory or compliance concerns might require that a particular weakness must be fixed, regardless of cost. This scenario itself is not a strong argument for removal of this factor, since the Remediation Effort factor could be rated with a "Not Applicable" value, which would not affect the final CWSS score.

Prevalence (P)

The Prevalence of a finding identifies how frequently this type of weakness appears in software.

When scoring an individual weakness finding in an automated-scanning context, this factor is likely to be scored with a "Not Applicable" value.

This factor is typically used during development of custom Top-N weakness lists, as opposed to scoring an individual finding in an automated-scanning context.

Since software can be successfully attacked even in the presence of a single weakness, the selected weights do not provide significant distinction between each other.

ValueCodeWeightDescription
Widespread W 1.0 The weakness is found in most or all software in the associated environment, and may occur multiple times within the same software package.
High H 0.9 The weakness is encountered very often, but it is not widespread.
Common C 0.8 The weakness is encountered periodically.
Limited L 0.7 The weakness is encountered rarely, or never.
Default D 0.85
Unknown U 0.5
Not Applicable NA 1.0 When performing targeted scoring against an application, Prevalence is normally expected to be irrelevant, since the individual application and the analytical techniques determine how frequently the weakness occurs, and many aggregated scoring methods will generate larger scores if there are more weaknesses.
Quantified Q This factor could be quantified with custom weights. Precise prevalence data may be available within limited use cases, provided the user is tracking weakness data at a low level of granularity. For example, a developer may be tracking weaknesses across a suite of products, or a code-auditing vendor could measure prevalence from the software analyzed across the entire customer base. In a previous version of CWSS, prevalence was calculated based on from raw voting data that was collected for the 2010 Top 25, which used discrete values (range 1 to 4) which were then adjusted to a 1-to-10 range.

CWSS Score Formula
CWSS Score Formula

A CWSS 0.6 score can range between 0 and 100. It is calculated as follows:

BaseFindingSubscore * AttackSurfaceSubscore * EnvironmentSubscore

Base Finding Subscore

The BaseFindingSubscore supports values between 0 and 100. Both the AttackSurfaceSubscore and EnvironmentSubscore support values between 0 and 1.

The Base Finding subscore (BaseFindingSubscore) is calculated as follows:

  • Base = [ (10 * TechnicalImpact + 5*(AcquiredPrivilege + AcquiredPrivilegeLayer) + 5*FindingConfidence) * f(TechnicalImpact) * InternalControlEffectiveness ] * 4.0
  • f(TechnicalImpact) = 0 if TechnicalImpact = 0; otherwise f(TechnicalImpact) = 1.

The maximum potential BaseFindingSubscore is 100.

The definition of f(TechImpact) has an equivalent in CVSS. It is used to ensure that if the Technical Impact is 0, that the other added factors do inadverently generate a non-zero score.

TechnicalImpact and the AcquiredPrivilege/AcquiredPrivilegeLayer combination are given equal weight, each accounting for 40% of the BaseFindingSubscore. (Each generate a sub-value with a maximum of 10). There is some adjustment for Finding Confidence, which accounts for 20% of the Base (maximum of 5). The InternalControlEffectiveness can adjust the score downward, perhaps to 0, depending on the strength of any internal controls that have been applied to the issue. After application of InternalControlEffectiveness, the possible range of results is between 0 and 25, so the 4.0 coefficient is used to adjust the BaseFindingSubscore to a range between 0 and 100.

Attack Surface Subscore

The AttackSurfaceSubscore is calculated as:

  • [ 20*(RequiredPrivilege + RequiredPrivilegeLayer + AccessVector) + 20*DeploymentScope + 10*LevelInteraction + 5*(AuthenticationStrength + AuthenticationInstances) ] / 100.0

The combination of required privileges / access makes up 60% of the Attack Surface subscore; deployment scope, another 20%; interaction, 10%; and authentication, 10%. The authentication requirements are not given much focus, under the assumption that strong proof of identity will not significantly deter an attacker from attempting to exploit the vulnerability.

This generates a range of values between 0 and 100, which are then divided by 100.

Environmental Subscore

The EnvironmentalSubscore is calculated as:

  • [ (10 * BusinessImpact + 3*(LikelihoodOfDiscovery + LikelihoodOfExploit) + 3*Prevalence + RemediationEffort) * f(BusinessImpact) * ExternalControlEffectiveness ] / 20.0
  • f(BusinessImpact) = 0 if BusinessImpact == 0; otherwise f(BusinessImpact) = 1

BusinessImpact accounts for 50% of the environmental score, and it can move the final score to 0. ExternalControlEffectiveness is always non-zero (to account for the risk that it can be inadvertently removed if the environment changes), but otherwise it can have major impact on the final score. The combination of LikelihoodOfDiscovery/LikelihoodOfExploit accounts for 30% of the score, with Prevalence at 15% and RemediationEffort at an additional 5%.

Additional Features of the Formula

Since "Not Applicable" values have a weight of 1, the formula always has a potential maximum score of 100.0. In extremely rare cases in which certain factors are treated as not-applicable (e.g., Technical Impact, Business Impact, *and* Internal Control Effectiveness), then the minimum possible score would be non-zero.

There is significant diversity in the kinds of scores that can be represented, although the use of multiplication of many different factors, combined with multiple weights with small values, means that the range of potential scores is somewhat skewed towards lower values, although this is still a significant improvement over previous CWSS versions.

When default values are used for a large number of factors for a single score, using the median weights as defined in CWSS 0.6, the scores will skew heavily to the low side. The median weight for a factor does not necessarily reflect the most likely value that could be used, so the selection of Default weights may be changed in future versions. Ideally, the formula would have a property in which the use of many default values produces a score that is relatively close to 50; the selection of non-default values would adjust the final score upward or downward, thereby increasing precision.

The use of "Unknown" values also generally produces scores that skew to the low side. This might be a useful feature, since scores will be higher if they have more specific information.

CWSS Vectors, Scoring, and Score Portability
CWSS Vectors, Scoring, and Score Portability

With the abbreviations as specified above, a CWSS score can be stored in a compact, machine-parsable, human-readable format that provides the details for how the score was generated. This is very similar to how CVSS vectors are constructed.

Unlike CVSS, not all CWSS factors can be described symbolically with discrete values. Several factors can be quantified with continuous weights that override the originally-defined default discrete values. When calculated using CWRAF, the Impact factor is effectively an expression of 32 separate Technical Impacts and layers, many of which would not be applicable to a particular weakness. Treating each impact as a separate factor would roughly double the number of factors required to calculate a CWSS score.

In addition, the use of Business Value Context (BVC) to adjust scores for business-specific concerns also means that a CWSS score and its vector may appear to be inconsistent if they are "transported" to other domains or vignettes.

With this concern in mind, a CWSS 0.6 vector should explicitly list the weights for each factor, even though it increases the size of the vector representation.

The format of a single factor in a CWSS vector is:

FactorName:Value,Score

For example, "P:NA,1.0" specifies a "Not Applicable" value for Prevalence with a weight of 1.0. A specifier of "AV:P,0.2" indicates the "Physical" value for Access Vector with a weight of 0.2.

Factors are separated by forward slash characters, such as:

AV:I,1.0/RP:G,0.9/AS:N,1.0

which lists values and weights for "AV" (Access Vector), "RP" (Required Privilege Level), and "AS" (Authentication Strength).

Example 1: Business-critical application

Consider a reported weakness in which an application is the primary source of income for a company, thus has critical business value. The application allows arbitrary Internet users to sign up for an account using only an email address. A user can then exploit the weakness to obtain administrator privileges for the application, but the attack cannot succeed until the administrator views a report of recent user activities - a common occurrence. The attacker cannot take complete control over the application, but can delete its users and data. Suppose further that there are no controls to prevent the weakness, but the fix for the issue is simple, and limited to a few lines of code.

This situation could be captured in the following CWSS vector:

(TI:H,0.9/AP:A,1.0/AL:A,1.0/IC:N,1.0/FC:T,1.0/

RP:G,0.9/RL:A,1.0/AV:I,1.0/AS:N,1.0/AI:N,1.0/IN:Ltd,0.9/SC:All,1.0/

BI:C/0.9,DI:H,1.0/EX:H,1.0/EC:N,1.0/RE:L,1.0/P:NA,1.0)

The vector has been split into multiple lines for readability. Each line represents a metric group.

The factors and values are as follows:

FactorValue
Technical Impact High
Acquired Privilege Administrator
Acquired Privilege Layer Application
Internal Control Effectiveness None
Finding Confidence Proven True
Required Privilege Guest
Required Privilege Layer Application
Access Vector Internet
Authentication Strength None
Authentication Instances None
Level of Interaction Limited/Typical
Deployment Scope All
Business Impact Critical
Likelihood of Discovery High
Likelihood of Exploit High
External Control Effectiveness None
Remediation Effort Limited
Prevalence NA

The CWSS score for this vector is 93.1, derived as follows:

  • BaseSubscore:
    • [ (10 * TI + 5*(AP + AL) + 5*FC) * f(TI) * IC ] * 4.0
    • f(TI) = 1
    • = [ (10 * 0.9 + 5*(1.0 + 1.0) + 5*1.0) * 1 * 1.0 ] * 4.0
    • = 96
  • AttackSurfaceSubscore:
    • [ 20*(RP + RL + AV) + 20*SC + 10*IN + 5*(AS + AI) ] / 100.0
    • = [ 20*(0.9 + 1.0 + 1.0) + 20*1.0 + 10*0.9 + 5*(1.0 + 1.0) ] / 100.0
    • = 0.97
  • EnvironmentSubscore:
    • [ (10 * BI + 3*(DI + EX) + 3*P + RE) * f(BI) * EC ] / 20.0
    • f(BI) = 1
    • = [ (10 * 1 + 3*(1.0 + 1.0) + 3*1.0 + 1.0) * 1 * 1.0 ] / 20.0
    • = 1

The final score is:

96 * 0.97 * 1 = 93.1

Example 2: Wiki with limited business criticality

Consider this CWSS vector. Suppose the software is a wiki that is used for tracking social events for a mid-size business. Some of the most important characteristics are that there is medium technical impact to an application administrator from a regular user of the application, but the application is not business-critical, so the overall business impact is low. Also note that most of the environment factors are set to "Not Applicable."

(TI:M,0.6/AP:A,1/AL:A,1/IC:N,1/FC:T,1/

RP:RU,0.7/RL:A,1/AV:N,1/AS:L,0.9/AI:S,0.8/IN:Aut,1/SC:NA,1/

BI:L/0.3,DI:NA,1/EX:NA,1/EC:N,1/RE:NA,1/P:NA,1)

The CWSS score for this vector is 50.5, derived as follows:

  • BaseSubscore:
    • [ (10 * TI + 5*(AP + AL) + 5*FC) * f(TI) * IC ] * 4.0
    • f(TI) = 1
    • = [ (10 * 0.6 + 5*(1 + 1) + 5*1) * f(TI) * 1 ] * 4.0
    • = 84
  • AttackSurfaceSubscore:
    • [ 20*(RP + RL + AV) + 20*SC + 10*IN + 5*(AS + AI) ] / 100.0
    • = [ 20*(0.7 + 1 + 1) + 20*1 + 10*1 + 5*(0.9 + 0.8) ] / 100.0
    • = 0.925
  • EnvironmentSubscore:
    • [ (10 * BI + 3*(DI + EX) + 3*P + RE) * f(BI) * EC ] / 20.0
    • f(BI) = 1
    • = [ (10 * 0.3 + 3*(1 + 1) + 3*1 + 1) * f(BI) * 1 ] / 20.0
    • = 0.65

The final score is:

84 * 0.925 * 0.65 = 50.5

If the Business Impact (BI) is set to Medium instead, then the score would rise to 62.2; if set to High, then 73.8; and if set to Critical, then the score would be 83.8. (Since the Technical Impact is only "Medium," the maximum CWSS score cannot be 100.)

Other Approaches to CWSS Score Portability

Instead of recording each individual weight within a CWSS vector, several other methods could be adopted.

One approach would be to attach BVC metadata such as the Technical Impact Scorecard to a set of generated CWSS scores, but it could be too easy for this metadata to become detached from the scores/vectors. Quantified factors would still need to be represented within a vector, since they could vary for each weakness finding.

Another approach is that when CWSS scores are transferred from one party to the other, but a BVC is not included, then the receiving party should re-calculate the scores from the given CWSS vectors, then compare the re-calculated scores with the original scores. A difference in scores would suggest that different BVCs are in use between the provider and receiver.

Considerations for CWSS beyond 0.6
Considerations for CWSS beyond 0.6

For future versions, the following should be considered.

Current Limitations of the Scoring Method

The formula in CWSS 0.6 is an improvement over the previous versions, but it still needs to be refined to ensure that the range of potential scores is more evenly distibuted. (In CWSS 0.3, the final CWSS scores were heavily skewed to low values between 0.0 and 2.0 out of a potential score of 100.0; in CWSS 0.4, there was a greater balance in potential scores, but many factors were weighed too closely together, and the generated scores still were not intuitive.) There are probably unexpected interactions between factors that must be identified and resolved. CVSS scoring contains built-in adjustments that prevent many factors from affecting the score too much, while also giving some preference to impact over exploitability; similar built-in adjustments may need to be performed for CWSS.

CWSS 0.6 provides users with some ability to give higher scores to design/architecture issues in comparison with implementation errors, which is an improvement over previous versions, but perhaps still insufficient. This approach is important in some contexts, such as when a single report of "lack of an input validation framework" is expected to carry more weight than multiple individual XSS and SQL injection bugs. The Remediation Effort factor allows users to adjust priority of design/architecture issues, since they typically require more effort to fix than implementation bugs. The Business Impact factor can also be used. This discrepancy might be also resolved with manual, CWRAF-oriented, weakness-specific scoring, but the process as defined in CWSS 0.6 does not make any such distinctions. CWE data could be mined to determine whether the weakness is in implementation or in design/architecture, so this knowledge could be obtained automatically; however, the boundary between design and implementation is usually not well-defined, and many CWE entries could occur in either phase.

There are also some challenges for scoring findings that combine multiple weaknesses. By their nature, compound elements such as chains and composites involve interactions between multiple reported weaknesses. With some detection techniques such as automated code scanning, multiple CWE entries might be reported as separate findings for a single chain or composite. This might artificially inflate any aggregate scoring, since the results might be double-counted for each entry within the chain or composite. However, sometimes the compound element is more than the sum of its parts, and the combination of multiple weaknesses has a higher impact than the maximum impact of any individual weakness. This is not well-handled in the current CWSS scheme; chains and composites are a relatively new concept, and they are likely difficult to identify automatically with code analysis. This challenge is probably beyond the scope of CWSS.

It is anticipated that CWSS may be considered for use in other types of software assessments, such as safety, reliability, and quality. Weaknesses or other issues related to code quality might receive higher prioritization within a vignette-oriented scheme, since safety, compliance, or maintainability might be important. This usage is not explicitly supported with CWSS 0.6. However, such quality-related issues could be scored in which the Required Privilege is the same as Acquired Privilege, and the Required Privilege Layer is the same as the Acquired Privilege Layer; the Business Impact could also be used.

Community Review and Validation of Factors

Pending community review, future versions of CWSS might modify, remove, or add factors to the methodology.

Some factors might be removed based on community feedback. Some of the more controversial examples are Remediation Effort and Likelihood of Discovery. The reasons for potential removal are described in the detailed descriptions for each factor.

Note that each factor supports a "Not Applicable" value that does not impact the final CWSS score. If there are enough stakeholders or use-cases for whom the factor is important, then this would be a strong argument for keeping the factor within CWSS, even if it is not essential for everyone. The "Not Applicable" value could be used for scoring contexts in which the given factor is not relevant.

Additional CWSS Factors

As CWSS matures, additional factors might become an essential part of future versions of the framework.

The "Weakness Scope" factor could be used to cover the following scenario. Within CWSS 0.3, design and architecture flaws receive the same relative priority as implementation issues, even though they may lead to a complete compromise of the software. It may be reasonable to use a separate factor in order to give design/architecture flaws a larger weight, e.g. so that the lack of an input validation framework (one "finding") can be given higher priority than hundreds of individual findings for XSS or SQL injection. Note that there is already some relationship with the "Extensive" value of the Remediation Effort factor, but in CWSS 0.6, this value reduces the overall score, and the Remediation Effort factor is being considered for removal from CWSS.

In previous versions of this paper before CWSS 0.6, several other CWSS factors were proposed, but they generally fell under the "business impact" category.

Constituency-focused Scoring

Within a vignette, there are often different users and communicating organizations that all use the same system or system-of-systems. These form separate consituencies.

When performing scoring, the score might vary depending on the perspective of:

  • The application's users
  • The application
  • The physical host or operating system
  • The containing network
  • The entire organization

An individual weakness finding within a specific, targeted package could have different scores for each of these constituencies.

In CWSS 0.6, this is now partially handled by using the SANE model of privilege layers (System, Application, Network, Enterprise), although only one combination of privilege/layer can be specified for a single CWSS score.

Supporting multiple scores for a single finding might introduce too much additional complexity into CWSS. This constituency separation could be handled using

CWRAF's vignette model by defining one low-level vignette for the application's users, one for the application itself, one for the physical host, etc.

Impact of CWSS and CWE Changes to Factors and Subscores

The values for the factors involved in scoring might change frequently, especially for early versions of CWSS. For example, the likelihood of discovery of a particular weakness may change - rising if detection techniques improve (or if there is a shift in focus because of increases in attacks), or falling if enough developers have methods for avoiding the weakness, and/or if automatic protection mechanisms reach wide-scale adoption.

In the future, default values for some factors might be directly obtained from CWE data. However, new CWE versions are released on a regular basis, approximately 4 or 5 times a year. If a CWE entry is modified in a way that affects CWSS-related factors, then the resulting CWSS score for a weakness might differ depending on which version of CWE is used. Theoretically, however, this could be automatically detected by observing inconsistencies in the weights used for "Default" values in CWSS vectors.

Because of these underlying changes, there is a significant risk that CWSS scores will not be comparable across organizations or assessments if they were calculated using different versions of CWSS, vignettes, or CWE.

In anticipation of such changes, CWSS design should consider including CWE and/or CWSS version numbers in the CWSS vector representation (or associated metadata).

Finally, these changes should not occur too frequently, since each change could cause CWSS scores to change unpredictably, causing confusion and possibly interfering with strategic efforts to fix weaknesses whose importance has suddenly been reduced. CVSS encountered these problems when changing from version 1 to version 2, and there were significant labor costs to re-score vulnerabilities, which numbered in the tens of thousands. As a result of this, there has been significant reluctance by the CVSS SIG to make any substantive changes beyond version 2. Although this may be inevitable for CWSS as a natural result of growth, the community should attempt to prevent this from happening where possible.

While scores may change as CWSS and CWE evolve, there is not necessarily a requirement for an organization to re-score whenever a new version is released, especially if the organization is using CWSS for internal purposes only. The variability of scores is largely a problem for sharing scores between organizations. For example, a software developer may have its own internally-defined vignettes and BVC, so the developer may not have a need (or willingness) to share CWSS scores outside the organization.

Future Activities
Future Activities

The majority of the development and refinement of the first major version of CWSS will occur during 2011. Current plans include:

  • Continue to obtain stakeholder validation or feedback for the existing factors, values, and weights.
  • Continue to modify the scoring method and formula so that there is less bias towards low scores, and incomplete or non-applicable data does not adversely affect the scores.
  • Continue to collect and evaluate similar metrics from additional sources. MITRE is consulting with major software developers, vendors of code analysis tools, and software security consultants to capture other scoring approaches that are currently being used.
  • Refine and evaluate aggregated scoring techniques.
  • Define a data exchange representation for CWSS scores and vectors, e.g. XML/XSD.

Community Participation in CWSS
Community Participation in CWSS

Currently, members of the software assurance community can participate in the development of CWSS in the following ways:

  • Provide feedback on this document.
  • Review the factors that are currently defined; suggest modifications to the current factors, and any additional factors that would be useful.
  • Evaluate the scoring formula and the relative importance of factors within that formula.
  • Define specific use cases for CWSS.

Appendix A: CVSS
Appendix A: CVSS

The Common Vulnerability Scoring System (CVSS) is commonly used when ranking vulnerabilities as they appear in deployed software. CVSS provides a common framework for consistently scoring vulnerabilities.

Conceptually, CVSS and CWSS are very similar. There are some important strengths and limitations with CVSS, however.

One of CVSS' strengths lies in its simplicity. CVSS divides the overall score into 14 separate characteristics within three metric groups: Base, Temporal, and Environmental. Each characteristic is decomposed into two or more distinct values. For example, the Access Vector reflects the location from which an attacker must exploit a vulnerability, with possible values of Local (authenticated to the local system), Remote (across the network), or Network Adjacent (on the same physical or logical network). Typically, in addition to the CVSS score, a vector is provided that identifies the selected values for each characteristic.

With the associated documentation, CVSS scoring is fairly repeatable, i.e., different analysts will typically generate the same score for a vulnerability. However, different scores can be generated when information is incomplete, and significant variation is possible if an analyst does not closely follow documentation. While the simplified Confidentiality/Integrity/Availability model does not provide the depth and flexibility desired by some security experts, CVSS does provide the consistency that is useful for non-expert system and network administrators for prioritizing vulnerabilities.

CVSS has been widely adopted, especially the use of base scores from the Base metric group. Some organizations use the Temporal and Environmental portions of CVSS, but this is relatively rare, so these metric groups may not have been sufficiently vetted in the real world.

CVSS in a Software Assurance Context

CVSS has some important limitations that make it difficult to adapt to software security assessment.

There are some significant aspects of CVSS that make it difficult for direct use within a weakness-scoring context.

  • CVSS assumes that a vulnerability has already been found and detected. This is not scalable to the security assessment of a single software package. A detailed assessment, such as an automated code scan, may report thousands of weakness findings. Because of the high volume, these findings often need to be scored and prioritized before they can be more closely examined to determine if they lead to vulnerabilities.
  • CVSS does not fully account for incomplete information, which is sometimes a problem because many vulnerability reports do not contain all the relevant details needed for scoring. A conservative approach, as adopted by the National Vulnerability Database, is to select values that will generate the largest CVSS score. This approach is viable, as long as most vulnerability reports contain sufficient information. Within a weakness-scoring context, a significantly high percentageof weakness findings will be missing a critical piece of information,since some detection techniques will not be able to reliably determineif a weakness can be exploited by an attacker without furtheranalysis. As a result, there is a need to explicitly record wheninformation is unavailable.
  • CVSS scoring is performed relative to the issue's impact on the physical system. However, in some contexts, users may strongly prefer to score issues based on their impact to business-critical data or functionality, which might have limited implications for the impact to the overall physical system. For example, the maximum possible score for CVSS is often 7.0 for Oracle products, since these products typically run with limited privileges. To account for this limitation, Oracle has used the non-standard "Partial+" rating for Confidentiality, Integrity, and Availability as an unofficial short-hand to emphasize to their customers when a vulnerability can be used to completely compromise a database, even though the impact to the underlying physical system may be minimal due to limited OS privileges of the affected database process. CWSS will be used by a variety of consumers, and in some cases therewill be users who want to prioritize weaknesses according todifferent criteria than the impact on the physical system.

The development of CWSS will seek to preserve the strengths of CVSS while also attempting to avoid some of the associated limitations.

Adaptations of CVSS

Several organizations have attempted to use or modify CVSS as a method for measuring levels of software security.

Cigital performed a feasibility study of CVSSv2. More information is provided in Appendix B.

Veracode uses an adaptation of CVSS to evaluate detected weaknesses/vulnerabilities. Each issue is given weights for Confidentiality, Integrity, and Availability, based on its associated CWE entry. The weighting considers the average likely severity to occur. For example, a buffer overflow could allow an attacker to cause a crash, but it is not always exploitable for code execution.

For aggregated scores, Veracode has several "VERAFIED Security Marks" that are based on a calculated Security Quality Score (SQS), which ranges from 0 to 100. The "VerAfied" security mark is used to indicate software that Veracode has assessed to be free of "very high," "high," or "medium" severity vulnerabilities, and free of automatically-detectable vulnerabilities from the CWE/SANS Top 25 or OWASP Top Ten. Two "High Assurance" variations of the mark include a manual assessment step that covers the remainder of the CWE/SANS Top 25 or OWASP Top Ten that could not be identified by automatic detection.

The Veracode Rating System uses a three-letter rating system (with grades of "F", "D", "C", "B", and "A"). The first letter is used for the results from binary analysis, the second for automated dynamic analysis, and the third for human testing.

Comparison of CWSS Factors with CVSS

Note that in CVSS, the Access Complexity (AC) value combines multiple characteristics that are split into distinct factors within CWSS, such as Required Privilege Level and Level of Interaction.

CVSSCWSSNotes
Confidentiality Impact (C), Integrity Impact (I), Availability Impact (A), Security Requirements (CR, IR, AR), Collateral Damage Potential (CDP) Technical Impact CWSS attempts to use a more fine-grained "Technical Impact" model than confidentiality, integrity, and availability. Business Value Context adjustments effectively encode the security requirements from the Environmental portion of CVSS. The CDP is indirectly covered within the BVC's linkage between business concerns and technical impacts.
Access Complexity (AC), Target Distribution (TD) Deployment Scope Deployment Scope is indirectly covered by CVSS' Access Complexity, which combines multiple distinct factors into a single item. It also has an indirect association with Target Distribution (TD).
Access Vector (AV) Access Vector The values are similar, but CWSS distinguishes between physical access and local (shell/account) access.
Access Complexity (AC) Required Privilege Level Required Privilege Level is indirectly covered by CVSS' Access Complexity, which combines multiple distinct factors into a single item.
N/A Authentication Strength This is not directly specified within CVSS, but scorers might consider the authentication strength when evaluating Access Complexity (AC).
Authentication (Au) Authentication Instances
N/A Likelihood of Discovery Within many CVSS use-cases, the vulnerability has already been discovered and disclosed by another party when CVSS scoring takes place. So there is no need to track the likelihood of discovery, as the likelihood is (effectively) 1.0. However, within some CWSS use-cases, the issue is only known to the developer at the time of scoring, and the developer may choose to increase the priority of issues that are most likely to be discovered.
N/A Likelihood of Exploit This is not covered in CVSS.
Access Complexity (AC) Interaction Requirements
Access Complexity (AC), Remediation Level (RL) Internal Control Effectiveness (IC) The presence (or absence) of controls/mitigations may affect the CVSS Access Complexity.
Access Complexity (AC) External Control Effectiveness (EC) The presence (or absence) of controls/mitigations may affect the CVSS Access Complexity. However, a single CVE vulnerability could have different CVSS scores based on vendor-specific configurations.
Report Confidence (RC) Finding Confidence
N/A Remediation Effort (RE)
Exploitability (E) N/A
Target Distribution (TD) N/A These is no direct connection in CWSS 0.3 for target distribution; there is no consideration of how many installations may be using the software. This may be added to future versions of CWSS.

Other Differences between CVSS and CWSS

Some reviewers of early CWSS versions suggested that CWSS adopt the same set of metric groups that are used by CVSS - Base, Temporal, and Environmental. However, since CWSS scores can be calculated in early, low-information scenarios, many factors are "temporal" in nature, and likely to change as further analysis yields more information about the weakness. CWSS supports the use of values such as "Unknown" or "Default", which can be filled in at a later time.

One aspect of CVSS that is not explicitly modeled in CWSS is the notion of "partial" impacts. However, the acquired privileges, privilege layer, technical impact, andbusiness impact are roughly equivalent, with more expressive power.

Appendix B: Other Scoring Methods
Appendix B: Other Scoring Methods

2008 CWSS Kickoff Meeting

In October 2008, a single-day kickoff meeting for CWSS was held. Several participants described their scoring approaches.

Veracode reported their assignment of Confidentiality, Integrity, and Availability scores for CVSS-based assessment of CWE weaknesses. More details are provided in a later subsection.

Cigital described a feasibility study of CVSSv2 with respect to weaknesses. Some attributes such as "Target Distribution" did not fit well. Other attributes were extended to add more granularity. A polynomial scoring method was recommended. It was also regarded as important to model the distinction between the likelihood and the impact.

Cenzic provided details of the Hailstorm Application Risk Metric (HARM). It is a quantitative score that is utilized by black box analysis of web applications. The goal of the metric was to provide a scalable approach to focus remediation efforts. The metric was split into 4 impact areas relevant to web application security: the browser, the session, the web application, and the server. The benefit to this approach was that it was easily consumable.

CERT/SEI presented its approach to scoring the C Secure Coding Rules. The FMECA metric, an ISO standard, was used. It characterizes items in terms of Severity, Likelihood (of leading to a vulnerability), and Remediation Cost.

2010 SANS/CWE Top 25

The 2010 SANS/CWE Top 25 Most Dangerous Software Errors list attempted to perform quantitative prioritization of CWE entries using a combination of Prevalence and Importance, which became the basis of CWSS 0.1 later in the year. A survey approach was taken in which respondents performed their own individual evaluation of Prevalence and Importance for 41 candidate weaknesses, from which the final scores were determined. To reflect the diverse opinions and use cases of the respondents for the general Top 25 list, the Importance factor was used instead of Impact. In an attempt to force consensus, respondents were restricted to 4 selections of the highest value for Importance ("Critical") and Prevalence ("Widespread"), although this forced choice was not popular; it will probably be abandoned in future versions of the Top 25. Many respondents used high-level rollup data, or a rough consensus of opinion with the organization, sometimes covering multiple teams or functions. Very few respondents had real-world data at the low level of granularity used by the Top 25 (typically the "Base" level of abstraction for CWE). An evaluation by PlexLogic later found that the two variables were not entirely independent. This discovery makes some sense, because the vulnerability research community tends to focus on vulnerabilities/weaknesses with the highest impact. When reliable attack techniques are devised for a particular weakness/vulnerability, it becomes easier for more researchers to find them, which can lead to widespread exploitation. Consequently, this raises the relative Importance of a weakness.

The 2010 Top 25 was structured in a way to support multiple points of view that could reflect different prioritizations of the weaknesses. The creation of separate focus profiles stemmed from some critiques of the original 2009 Top 25, in which a generalized Top 25 list would not necessarily be useful to all audiences, and that a customized prioritization would be ideal. Eight focus profiles were provided with the 2010 Top 25. For example, the Educational Emphasis focus profile evaluated weaknesses that are regarded as important from an educational perspective within a school or university context. It emphasized the CWE entries that graduating students should know, including weaknesses that were historically important or increased the breadth of coverage. A separate focus profile ranked weaknesses based solely on their evaluated Importance, which would be useful to software customers who want the most serious issues removed, without consideration for how frequently they occur or how resource-intensive it is to fix. These ranking-oriented focus profiles made the Top 25 more useful to certain audiences, and their construction and management have served as a useful predecessor to CWSS and vignettes.

While the 2009 Top 25 did not rank items, several factors were presented that were thought to be relevant to an audience: attack frequency, impact or consequences, prevalence, and ease of detection. Other considerations included remediation cost, amount of public knowledge, and the likelihood that the weakness discovery would increase in the future.

2010 OWASP Top Ten

In contrast to previous versions, the 2010 OWASP Top Ten shifted focus from weaknesses/vulnerabilities to risks, which typically caused each OWASP Top Ten entry to cover multiple related weakness types that posed the same risk. Factors for prioritization included Ease of Exploit, Prevalence, Detectability, and Technical Impact. Input from contributors was solicited to determine the values for these factors, but the final decision for each factor was made by the Top Ten editorial staff based on trend information from several real-world data sources. A metric was developed that used these factors to prioritize the final Top Ten list.

Other Models

Microsoft's STRIDE model characterizes issues in terms of Spoofing Identity, Tampering with Data, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. The DREAD scheme evaluates issues based on Damage Potential, Reproducibility of the issue, Exploitability, Affected Users, and Discoverability. Many of these attributes have equivalent factors in CWSS.

Appendix C: Generalized Scoring Approaches
Appendix C: Generalized Scoring Approaches

While CWSS 0.3 is focused on targeted scoring, it could be further adapted for scoring weaknesses in a general fashion, e.g. to develop a relative prioritization of issues such as buffer overflows, XSS, and SQL injection, independent of any specific software package.

A generalized scoring approach could account for:

  • Prevalence: how often does this appear at least once within a software package?
  • Frequency: in a software package in which this weakness occurs, how often does it occur? (perhaps summarized as "diffusion")
  • Likelihood of Discovery
  • Likelihood of Exploit
  • Technical Impact

In the earlier CWSS 0.1, the formula was:

Prevalence x Importance

This formula was a characterization of the metric used for the 2010 CWE/SANS Top 25. Importance was derived from the vignette-specific subscores for Technical Impacts of the CWE entry. Prevalence could be obtained from general information (derived from CWE content, or from other sources), with the possibility of vignette-specific specifications of prevalence. For example, XSS or SQL injection might occur more frequently in a web-based retail context than in embedded software.

Prevalence Assessment

In the earlier CWSS version 0.1, prevalence scores for the 2010 Top 25 were obtained by re-using raw voting data from the 2010 Top 25 participants. The original 1-4 scale (with discrete values) was extended to use values between 1 and 10. When using real-world prevalence data, this artificial normalization might not be necessary.

The following table summarizes the prevalence scores for some of the Top 25 entries. Notice the high prevalence value for XSS; this reflects the fact that nearly all of the voting members scored XSS as "Widespread." Complete details are available on a separate page.

Top 25 RankCWENamePrevalence (1-10)
[1] CWE-79 XSS 9.46
[2] CWE-89 SQL Injection 7.43
[3] CWE-120 Classic Buffer Overflow 6.04
[4] CWE-352 Cross-site Request Forgery 7.75
[16] CWE-209 Information Exposure Through an Error Message 7.11

Appendix D: Aggregated Scoring Methods: Measuring Weakness Surface
Appendix D: Aggregated Scoring Methods: Measuring Weakness Surface

For years, software consumers have wanted clear guidance on how secure a software package is, but the only available methods have been proprietary, crude, or indirect, such as:

  • Crude methods, such as counting the number and/or severity of publicly reported vulnerabilities
  • Proprietary methods developed by consultants or tool developers
  • Indirect methods, such as the attack surface metric, which likely has a strong association with overall software security, although this has not necessarily been empirically proven.

A software package could be evaluated in light of the number and importance of weaknesses that have been detected, whether from automated or manual techniques. The results from these individual weaknesses could be aggregated to develop a single score, tentatively called the "Weakness Surface." This could move the software assurance community one step closer to consumer-friendly software security indicators such as the Software Facts Label concept, as proposed by Jeff Williams (OWASP) and Paul Black (NIST).

When there is a set of targeted weaknesses for a single software package, there are several possible aggregated scoring methods, including but not necessarily limited to:

  • Compute the sum of all individual weakness scores
  • Choose the highest of all individual weakness scores
  • Select a subset of individual weakness scores exceeding a stated minimum, and add the scores together
  • Compute the sum of all individual weakness scores, then normalize these scores according to KLOC or other metrics that reflect code size, i.e., "defect density."
  • Normalize the results on a per-executable basis.
  • Normalize the results to a point scale between 0 (no assurance) and 100 (high assurance).

Some methods from the 2008 CWSS kickoff workshop may be adaptable or applicable; see Appendix B. In addition, some SCAP users have begun creating aggregated metrics for a host system by aggregating CVSS scores for the individual CVE vulnerabilities that are detected on the host. These users may have some useful guidance for the CWSS approach to aggregate scoring.

Change Log
Change Log
DateDocument VersionNotes
June 27, 2011 0.8

Bumped up version number to synchronize with CWRAF.

June 23, 2011 0.6

Major changes to the formula to better reflect relative priorities of the factors.

Modified Access Vector (AV) to include Internet, Intranet, and Private Network values.

Renamed Remediation Cost (RC) to Remediation Effort (RE), and changed the available values.

Changed External Control Effectiveness (EC) weight for "Complete" to 0.1, to reflect the possibility of accidental removal of the control if the environment changes.

Modified values for Authentication Strength (AS) and added notes for potential enhancements.

Modified weights for Prevalence (P) so the range of variation is more narrow.

All weights for Unknown values were changed to 0.5 so that lack of information does not over-emphasize scores; additional information can move scores up or down, accordingly.

Changed Defense-in-Depth/"D" value to Indirect/"I" for internal and external control effectiveness to avoid conflict with the Default/"D" value.

Removed most references to vignettes, technical impact scorecards, business value context, etc. - now covered in CWRAF.

Skipped version 0.5 to reflect maturity and for alignment with other efforts.

April 26, 2011 0.4

Removed content that became part of Common Weakness Risk Analysis Framework (CWRAF).

Reorganized metric groups.

Defined new factors - Business Impact, Acquired Privilege, Acquired Privilege Layer.

Defined a new formula.

Added "Default" values to each factor.

March 7, 2011 0.3 Created overview images and shortened the introduction. Defined technology groups, added more business domains, added more vignettes. Annotated each factor that could be quantified. Updated stakeholders section. Integrated CVSS descriptions into a single section. Added more details on the scoring method, including CWSS vectors.
February 11, 2011 0.2 Added business domains, archetypes, and Business Value Context; identified detailed factors; emphasized use of CWSS for targeted scoring; reorganized sections; made other modifications based on community feedback.
December 2, 2010 0.1 Initial version for review by limited audience
Page Last Updated: May 07, 2013