CWE - CWE-1434: Insecure Setting of Generative AI/ML Model Inference Parameters (4.18)

Weakness ID: 1434

Vulnerability Mapping: ALLOWED This CWE ID may be used to map to real-world vulnerabilities
Abstraction: Base Base - a weakness that is still mostly independent of a resource or technology, but with sufficient details to provide specific methods for detection and prevention. Base level weaknesses typically describe issues in terms of 2 or 3 of the following dimensions: behavior, property, technology, language, and resource.

View customized information:

For users who are interested in more notional aspects of a weakness. Example: educators, technical writers, and project/program managers. For users who are concerned with the practical application and details about the nature of a weakness and how to prevent it from happening. Example: tool developers, security researchers, pen-testers, incident response analysts. For users who are mapping an issue to CWE/CAPEC IDs, i.e., finding the most appropriate CWE for a specific issue (e.g., a CVE record). Example: tool developers, security researchers. For users who wish to see all available information for the CWE/CAPEC entry. For users who want to customize what details are displayed.

Description

The product has a component that relies on a generative AI/ML model configured with inference parameters that produce an unacceptably high rate of erroneous or unexpected outputs.

Extended Description

Generative AI/ML models, such as those used for text generation, image synthesis, and other creative tasks, rely on inference parameters that control model behavior, such as temperature, Top P, and Top K. These parameters affect the model's internal decision-making processes, learning rate, and probability distributions. Incorrect settings can lead to unusual behavior such as text "hallucinations," unrealistic images, or failure to converge during training. The impact of such misconfigurations can compromise the integrity of the application. If the results are used in security-critical operations or decisions, then this could violate the intended security policy, i.e., introduce a vulnerability.

Common Consequences

This table specifies different individual consequences associated with the weakness. The Scope identifies the application security area that is violated, while the Impact describes the negative technical impact that arises if an adversary succeeds in exploiting this weakness. The Likelihood provides information about how likely the specific consequence is expected to be seen relative to the other consequences in the list. For example, there may be high likelihood that a weakness will be exploited to achieve a certain impact, but a low likelihood that it will be exploited to achieve a different impact.

Impact	Details
Varies by Context; Unexpected State	Scope: Integrity, Other The product can generate inaccurate, misleading, or nonsensical information.
Alter Execution Logic; Unexpected State; Varies by Context	Scope: Other If outputs are used in critical decision-making processes, errors could be propagated to other systems or components.

Potential Mitigations

Phase(s)	Mitigation
Implementation; System Configuration; Operation	Develop and adhere to robust parameter tuning processes that include extensive testing and validation.
Implementation; System Configuration; Operation	Implement feedback mechanisms to continuously assess and adjust model performance.
Documentation	Provide comprehensive documentation and guidelines for parameter settings to ensure consistent and accurate model behavior.

Relationships

This table shows the weaknesses and high level categories that are related to this weakness. These relationships are defined as ChildOf, ParentOf, MemberOf and give insight to similar items that may exist at higher and lower levels of abstraction. In addition, relationships such as PeerOf and CanAlsoBe are defined to show similar weaknesses that the user may want to explore.

Relevant to the view "Research Concepts" (View-1000)

Nature	Type	ID	Name
ChildOf	Base - a weakness that is still mostly independent of a resource or technology, but with sufficient details to provide specific methods for detection and prevention. Base level weaknesses typically describe issues in terms of 2 or 3 of the following dimensions: behavior, property, technology, language, and resource.	440	Expected Behavior Violation
ChildOf	Class - a weakness that is described in a very abstract fashion, typically independent of any specific language or technology. More specific than a Pillar Weakness, but more general than a Base Weakness. Class level weaknesses typically describe issues in terms of 1 or 2 of the following dimensions: behavior, property, and resource.	665	Improper Initialization
PeerOf	Pillar - a weakness that is the most abstract type of weakness and represents a theme for all class/base/variant weaknesses related to it. A Pillar is different from a Category as a Pillar is still technically a type of weakness that describes a mistake, while a Category represents a common characteristic used to group related things.	691	Insufficient Control Flow Management
CanPrecede	Class - a weakness that is described in a very abstract fashion, typically independent of any specific language or technology. More specific than a Pillar Weakness, but more general than a Base Weakness. Class level weaknesses typically describe issues in terms of 1 or 2 of the following dimensions: behavior, property, and resource.	684	Incorrect Provision of Specified Functionality

Modes Of Introduction

The different Modes of Introduction provide information about how and when this weakness may be introduced. The Phase identifies a point in the life cycle at which introduction may occur, while the Note provides a typical scenario related to introduction during the given phase.

Phase	Note
Build and Compilation	During model training, hyperparameters may be set without adequate validation or understanding of their impact.
Installation	During deployment, model parameters may be adjusted to optimize performance without comprehensive testing.
Patching and Maintenance	Updates or modifications may be made to the model that alter its behavior without thorough re-evaluation.

Applicable Platforms

This listing shows possible areas for which the given weakness could appear. These may be for specific named Languages, Operating Systems, Architectures, Paradigms, Technologies, or a class of such platforms. The platform is listed along with how frequently the given weakness appears for that instance.

Languages	Class: Not Language-Specific (Undetermined Prevalence)
Architectures	Class: Not Architecture-Specific (Undetermined Prevalence)
Technologies	AI/ML (Undetermined Prevalence) Class: Not Technology-Specific (Undetermined Prevalence)

Demonstrative Examples

Example 1

Assume the product offers an LLM-based AI coding assistant to help users to write code as part of an Integrated Development Environment (IDE). Assume the model has been trained on real-world code, and the model behaves normally under its default settings. Suppose there is a default temperature of 1, with a range of temperature values from 0 (most deterministic) to 2.

Consider the following configuration.

(bad code)

Example Language: JSON

{

"model": "my-coding-model",
"context_window": 8192,
"max_output_tokens": 4096,
"temperature", 1.5,
...

}

The problem is that the configuration contains a temperature hyperparameter that is higher than the default. This significantly increases the likelihood that the LLM will suggest a package that did not exist at training time, a behavior sometimes referred to as "package hallucination." Note that other possible behaviors could arise from higher temperature, not just package hallucination.

An adversary could anticipate which package names could be generated and create a malicious package. For example, it has been observed that the same LLM might hallucinate the same package regularly. Any code that is generated by the LLM, when run by the user, would download and execute the malicious package. This is similar to typosquatting.

The risk could be reduced by lowering the temperature so that it reduces the unpredictable outputs and has a better chance of staying more in line with the training data. If the temperature is set too low, then some of the power of the model will be lost, and it may be less capable of producing solutions for rarely-encountered problems that are not reflected in the training data. However, if the temperature is not set low enough, the risk of hallucinating package names may still be too high. Unfortunately, the "best" temperature cannot be determined a priori, and sufficient empirical testing is needed.

(good code)

Example Language: JSON

{

...
"temperature", 0.2,
...

}

In addition to more restrictive temperature settings, consider adding guardrails that test that independently verify any referenced package to ensure that it exists, is not obsolete, and comes from a trusted party.

Note that reducing temperature does not entirely eliminate the risk of package hallucination. Even with very low temperatures or other settings, there is still a small chance that a non-existent package name will be generated.

Weakness Ordinalities

Ordinality	Description
Primary	(where the weakness exists independent of other weaknesses)

Detection Methods

Method

Details

Automated Dynamic Analysis

Manipulate inference parameters and perform comparative evaluation to assess the impact of selected values. Build a suite of systems using targeted tools that detect problems such as prompt injection (CWE-1427) and other problems. Consider statistically measuring token distribution to see if it is consistent with expected results.

Effectiveness: Moderate

Note:Given the large variety of outcomes, it can be difficult to design testing to be comprehensive enough, and there is still a risk of unpredictable behavior.

Manual Dynamic Analysis

Effectiveness: Moderate

Note:Given the large variety of outcomes, it can be difficult to design testing to be comprehensive enough, and there is still a risk of unpredictable behavior.

Memberships

This MemberOf Relationships table shows additional CWE Categories and Views that reference this weakness as a member. This information is often useful in understanding where a weakness fits within the context of external information sources.

Nature	Type	ID	Name
MemberOf	Category - a CWE entry that contains a set of other entries that share a common characteristic.	1412	Comprehensive Categorization: Poor Coding Practices

Vulnerability Mapping Notes

Usage	ALLOWED (this CWE ID may be used to map to real-world vulnerabilities)
Reason	Acceptable-Use
Rationale	This CWE entry is at the Base level of abstraction, which is a preferred level of abstraction for mapping to the root causes of vulnerabilities.
Comments	Carefully read both the name and description to ensure that this mapping is an appropriate fit. Do not try to 'force' a mapping to a lower-level Base/Variant simply to comply with this preferred level of abstraction.

Notes

Research Gap

This weakness might be under-reported as of CWE 4.18, since there are no clear observed examples in CVE. However, inference parameters may be the root cause for various vulnerabilities - or important factors - but the vulnerability reports may concentrate more on the negative impact (e.g. code execution) or the weaknesses that the insecure settings contribute to. Alternately, dynamic techniques might not reveal the root cause if the researcher does not have access to the underlying source code and environment.

References

[REF-1487]

Joseph Spracklen, Raveen Wijewickrama, A H M Nazmus Sakib, Anindya Maiti, Bimal Viswanath and Murtuza Jadliwala. "We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs". 2025-03-02.
<https://arxiv.org/abs/2406.10279>. (URL validated: 2025-09-08)

Content History

Submissions
Submission Date	Submitter	Organization
2024-06-28 (CWE 4.18, 2025-09-09)	Lily Wong	MITRE
2024-06-28 (CWE 4.18, 2025-09-09)
Contributions
Contribution Date	Contributor	Organization
2025-02-28 (CWE 4.18, 2025-09-09)	AI WG "New Entry" subgroup
2025-02-28 (CWE 4.18, 2025-09-09)	Participated in regular meetings from February to August 2025 to develop and refine most elements of this entry.


	Site Map \| Terms of Use \| Manage Cookies \| Cookie Notice \| Privacy Policy \| Contact Us \| Use of the Common Weakness Enumeration (CWE™) and the associated references from this website are subject to the Terms of Use. CWE is sponsored by the U.S. Department of Homeland Security (DHS) Cybersecurity and Infrastructure Security Agency (CISA) and managed by the Homeland Security Systems Engineering and Development Institute (HSSEDI) which is operated by The MITRE Corporation (MITRE). Copyright © 2006–2025, The MITRE Corporation. CWE, CWSS, CWRAF, and the CWE logo are trademarks of The MITRE Corporation.

Common Weakness Enumeration

CWE-1434: Insecure Setting of Generative AI/ML Model Inference Parameters

Edit Custom Filter