It’s terrifying to think that 2025 is the year of LLM agents, and yet here we are… LLMs are still ridiculously vulnerable to jailbreaking. Sure, DeepSeek made huge ripples in the AI community when it launched a couple of weeks ago, and I admit it’s incredibly powerful. But this X user still easily managed to generate a meth recipe with a simple prompt injection.
Now imagine plugging these very same LLMs into medical tools, legal systems, and financial services we use everyday.
Did I forget to mention that more than half — 53%, to be exact — of companies building AI agents right now aren’t even fine-tuning their models? Honestly, I can’t blame them — fine-tuning costs a fortune to do effectively. However, this means that any vulnerabilities in these LLMs will carry over to the agents you use. So, don’t be too surprised when your ‘drug research AI’ suddenly decides to moonlight as a meth cook.
Jokes aside, the safety and security of LLMs isn’t just a problem — it’s a growing crisis. To prevent the onset of an AI apocalypse, we’ll need robust standards to guide the development, testing, and deployment of these AI systems to ensure LLM safety and security. Efforts like the EU AI Act and NIST AI RMF have made strides in this space (I’ve covered these in more detail in my LLM safety article), but none as comprehensive and LLM-focused as the OWASP Top 10 LLM for 2025.
What is the OWASP Top 10 LLM in 2025?
The OWASP Top 10 for LLM Applications 2025 outlines the ten most critical risks and vulnerabilities — along with mitigation strategies — for creating secure LLM applications. It’s the result of a collaborative effort by developers, scientists, and security experts. These guidelines cover the entire lifecycle: development, deployment, and monitoring.
That’s right, it doesn’t just stop at testing your LLM applications in development. DeepSeek probably tested their model extensively, but that didn’t stop it from turning into a meth cook when it was released to the public… so monitoring is crucial, but we’ll get into this later.
Who is OWASP?
If you’re new to software security, you’re probably wondering: who is OWASP, and why should we trust them?
OWASP (Open Worldwide Application Security Project) is a non-profit organization recognized for producing open-source projects like the OWASP Top 10 lists for APIs, IoT, and now LLMs (Large Language Models). These lists are the result of extensive collaboration among security experts from around the world, and they ensure their findings and tools are freely accessible to everyone. So, trust me, you can trust them.
What’s new in the 2025?
The OWASP Top 10 LLM list was actually first released in 2023, but the rapid increase in LLM applications and adoption has unveiled new risks and vulnerabilities. This year’s most recent update is the most comprehensive and reflective of the current landscape yet.
Here’s what’s new:
- Excessive Autonomy: As 2025 emerges as the “year of LLM agents,” many applications are being granted unprecedented levels of autonomy. This shift has necessitated significant expansions on excessive agency risks in this year’s list.
- RAG Vulnerabilities: Additionally, with 53% of companies opting not to fine-tune their models and instead relying on
RAG
and Agentic pipelines, vulnerabilities related to vector and embedding weaknesses have earned a prominent spot on the Top 10. - System Prompt Risks: We’ve also seen system prompt leakage become an alarming issue, with many LLM developers treading the line between what to expose in the system prompt.
- Unbounded Consumption(?): Finally, the widespread enterprise adoption of LLMs and the resulting surge in resource management challenges has led to a surge in resource management challenges, aptly termed “unbounded consumption” by OWASP (a little bit confusing, don’t you think).
Congrats on making it this far, because this next section is going to be crucial. I’ll break down each of the 2025 OWASP Top 10 LLM risks with real-world examples of these attacks, and make sure you have a full grasp of what you’re dealing with (arguably, though, the following section is even more important, as I’ll explain how to systematically mitigate these risks).
Confident AI: The DeepEval LLM Evaluation Platform
The leading platform to evaluate and test LLM applications on the cloud, native to DeepEval.
Got Red? Red team LLM systems today with Confident AI
The leading platform to safety-test LLM applications on the cloud, native to DeepEval.
OWASP Top 10 LLM in 2025: The Full List
1. Prompt Injection (LLM01:2025)
If you’ve ever explored LLM security, you’ve likely come across prompt injection attacks. These attacks are the most common LLM attacks you’ll see, and target the input layer — what you feed into your LLM application. They encompass both human-understandable prompts (e.g., “My grandmother is dying…” requests) and nonsensical inputs (e.g., random tokens like “76dh3&^d”) that can cause the model to behave undesirably or even harmfully — intentional or not.
There are two types of prompt injections: direct and indirect. Direct prompt injections occur when the input directly causes the model to fail (failing means they act undesirably), while indirect injections happen when a model processes external sources like files or websites that lead to failure.
For Example:
- Direct Injection: An attacker tricks a chatbot into breaking its rules to access private information and send unauthorized emails.
- Indirect Injection: A user asks an AI to summarize a webpage, but hidden instructions in the page make the AI link to a malicious website and leak private conversations.
Prompt injection is a term that’s often used interchangeably with jailbreaking, as both exploit a model’s vulnerabilities on the input level to bypass safeguards. If you’re curious about the difference, I’ve written a comprehensive article covering every type of jailbreaking here.
There are a few things you can do to help prevent prompt injections. These mitigation strategies include:
- Constraining Model Behavior: Provide strict role instructions, enforce adherence to tasks, and ignore attempts to alter instructions.
- Validating Expected Output: Specify output requirements and validate formats using deterministic code checks. (You can do this using guardrails… more on this in the following section)
2. Sensitive Information Disclosure (LLM02:2025)
The risk of disclosing sensitive information is scary, which explains why OWASP moved this risk up four places from 6th in their 2024 list. For LLM applications and AI agents to provide better assistance, they need more access to YOUR data — health records, financial details, company secrets.
Whether through training datasets, RAG knowledge bases, database access, or simply users inputting information (you developers using ChatGPT on your codebases), there are multiple ways sensitive data can enter an AI system. Preventing sensitive data from entering is critical to avoid it leaking out.
Understandably, LLMs may need to access sensitive data in certain scenarios. However, in such cases, it’s absolutely essential to ensure no personal information is leaked. This can occur through various means, such as jailbreaking, cross-session leakage (where data leaks between different users), etc.
Here are some examples of how data leakage might occur:
- Targeted Prompt Injection: An attacker crafts inputs to bypass filters and extract sensitive information from the model.
- Data Leak via Training Data: Sensitive information is unintentionally included in the training data, leading to potential disclosure.
As you can imagine sensitive information disclosure can be astronomically catastrophic. Think losing billions of bitcoin dollars from an unintentional crypto-key leak. Fortunately, there are some mitigation techniques you can use:
- Integrate Data Sanitization Techniques: Mask sensitive content before training, ensuring personal data is excluded from the model.
- Robust Input Validation: Enforce strict input validation to detect harmful or sensitive data and prevent system compromise. (Again this deals with guardrails)
- Robust Output Validation: Enforce strict output validation to prevent data leakage (… also guardrails)
3. Supply Chain (LLM03:2025)
If you’re wondering, “What the heck is an LLM supply chain?” you’re not alone. OWASP describes it as all the external components used to build an LLM application — things like training datasets, LoRA adapters (used for fine-tuning), and pre-trained models.
These resources are fantastic for speeding up development and making it more accessible, but because they’re developed externally, they come with their own risks. For example, pre-trained models might include hidden biases, backdoors, or even malicious code that could compromise your application.
Here are some more supply chain attack examples:
- Vulnerable Python Library: An attacker exploits a compromised Python library, such as a PyTorch dependency with malware from the PyPi package registry.
- Direct Tampering: An attacker modifies and publishes a model with malicious parameters, bypassing safety checks
These vulnerabilities often remain undetected until they’re exploited, making them particularly dangerous. A seemingly reliable pre-trained model that’s been poisoned with a backdoor trigger can function as expected until a specific input, known by the attacker, activates the hidden vulnerability, and leads to model failure.
Whether that’s your LLM generating the same responses every single time or inappropriate content, it’s vital to minimize these risks, which means constantly checking and tracking your components and sources:
- Model Integrity: Use models from verified sources with integrity checks like signing and file hashes.
- Component Tracking: Maintain a signed Software Bill of Materials (SBOM) to track components and vulnerabilities
4. Data Poisoning (LLM04:2025)
Data poisoning occurs when attackers manipulate the data used during pre-training, fine-tuning, or embedding processes. This introduces vulnerabilities, biases, or backdoors, degrading model performance and generating harmful or biased outputs.
While this risk is more prominent during fine-tuning, it can also affect RAG systems if an attacker gains unauthorized access to the knowledge base and decides to poison the retrieval dataset.
There are several ways data poisoning can unfold:
- Biased Training Data: An attacker injects biased examples into training data, skewing outputs to propagate misinformation.
- Toxic Data Inclusion: Toxic or harmful data in fine-tuning leads to models generating biased or offensive responses.
Similar to mitigating supply chain risks, there’s no shortcut to preventing data poisoning — you must thoroughly vet all datasets and knowledge bases to ensure they are safe and clean. Trust me, you wouldn’t want your LLM to be making headlines for the wrong reasons (i.e. political opinions).
Here’s a few ways you can do that:
- Track Data Origins: Use tools like OWASP CycloneDX to verify data legitimacy and transformations throughout development.
- Vet Data Vendors: Rigorously validate data providers and check outputs against trusted sources to detect poisoning.
5. Improper Output Handling (LLM05:2025)
Improper Output Handling occurs when the outputs generated by an LLM aren’t properly validated or sanitized before being passed to other systems or components. Doesn’t sound too serious, right? Wrong.
Improper Output Handling can severely impact systems where LLM-generated outputs are used by downstream components, such as tools accessing APIs and databases. A single hallucination in a Text2SQL system, for example, can change DELETE FROM users WHERE id = 123
to DELETE FROM users
— and boom, suddenly, your entire database is wiped.
Here are some more attack examples:
- Privileged Function Misuse: An LLM passes an unvalidated output to an administrative extension, causing the extension to execute unintended maintenance commands.
- Sensitive Data Leakage: A summarizer tool powered by an LLM processes a malicious website with hidden prompts, causing it to send sensitive user data to an attacker-controlled server.
… And some ways you can enforce proper output handling.
- Context-Aware Encoding: Encode outputs for their specific use case (e.g., HTML encoding for web content, SQL escaping for database queries).
- Output Sanitization: Validate and sanitize LLM responses before passing them to backend functions or external systems.
Confident AI: The DeepEval LLM Evaluation Platform
The leading platform to evaluate and test LLM applications on the cloud, native to DeepEval.
Got Red? Red team LLM systems today with Confident AI
The leading platform to safety-test LLM applications on the cloud, native to DeepEval.
6. Excessive Agency (LLM06:2025)
Related to improper output handling is Excessive Agency, which is a major concern for agentic LLM applications that have tool access to influence the real world.
Excessive Agency breaks down into three areas: excessive functionality, excessive permissions, and excessive autonomy. As developers, it’s our responsibility to ensure we don’t over-equip these agents with tools, permissions, or autonomy beyond their intended use. Otherwise, the following can happen:
- Inbox Exploitation: An assistant with send-message permissions is tricked into forwarding sensitive emails to an attacker.
- Unnecessary Shell Access: A file-writing extension allows arbitrary commands, enabling unintended and harmful actions.
Unlike improper output handling, which focuses on validating and sanitizing LLM outputs before they’re used to call other tools, excessive agency addresses the direct use and management of tools by these agents.
To minimize excessive agency, you can:
- Limit Functionality and Permissions: Use narrowly scoped extensions and enforce minimal access.
- Require User Approval: Implement manual approval for high-impact actions like sending messages or executing commands.
7. System Prompt Leakage (LLM07:2025)
System prompt leakage occurs when sensitive information included in prompts is exposed, leading to risks like revealing internal rules, filtering criteria, or sensitive functionality.
Example Attacks include:
- Credential Exposure: An attacker extracts a system prompt containing credentials, gaining unauthorized access to external tools.
- Instruction Bypass: An attacker uses a prompt injection to override system prompt restrictions, enabling offensive content or remote code execution.
The best approach is to treat system prompts as simple instructions guiding the model’s outputs rather than a repository for sensitive data. However, when sensitive information must be included, safeguards like encryption and access controls are crucial.
Here are a few prompt leakage mitigation strategies:
- Separate Sensitive Data: Keep sensitive information like credentials and keys external to the system prompt.
- Implement Guardrails: Use independent systems to enforce security controls and validate model outputs against expectations.
8. Vector and Embedding Weaknesses (LLM08:2025)
Systems using RAG pipelines are particularly vulnerable to weaknesses in how vectors and embeddings are generated, stored, or retrieved. Attackers can exploit these weaknesses to inject harmful content, manipulate outputs, or access sensitive information.
For example:
- Unauthorized Access & Data Leakage: A misconfigured vector database allows unauthorized access to embeddings, exposing sensitive data like personal or proprietary information.
- Embedding Inversion Attacks: Attackers exploit vulnerabilities to reverse-engineer embeddings, recovering original data and compromising confidentiality.
Protecting vector stores requires robust security measures, including access controls, regular audits, and validation of embedding processes:
- Permission and Access Control: Enforce strict logical and access partitioning in vector databases, with fine-grained access controls for users.
- Data Validation & Source Authentication: Audit and validate all data sources regularly, and only accept data from trusted, verified providers.
9. Misinformation (LLM09:2025)
Misinformation remains a significant issue, even with the advancements in LLMs. Often, it manifests in the form of LLMs generating false but credible-sounding content, usually due to hallucinations — fabricated outputs that arise from gaps in the model’s training data. Biases and user over-reliance on the system further amplify these risks.
Misinformation can be highly damaging, leading not only to brand reputation issues but also to real-world harm. Example include:
- Malicious Package Exploitation: Attackers publish malicious packages using names commonly hallucinated by coding assistants, leading developers to unknowingly integrate harmful code.
- Inaccurate Medical Diagnosis: A medical chatbot provides incorrect information due to insufficient accuracy checks, causing harm to patients and exposing the company to lawsuit
To combat misinformation, LLMs need to be more reliable, and users should always verify outputs before integrating them into critical decisions. Here are a few mitigation strategies:
- RAG: Enhance reliability by retrieving verified data from trusted external sources during response generation.
- Cross-Verification and Oversight: Encourage users to validate outputs with external sources and implement human fact-checking for critical information.
10. Unbounded Consumption (LLM10:2025)
At the end of the list is unbounded consumption, which occurs when an LLM’s resource usage spirals out of control, leading to performance degradation, downtime, or unexpected costs.
While this issue is more about general system security than LLM-specific security, attacks that lead to unbounded consumption include:
- Uncontrolled Input Size: An attacker submits an excessively large input, consuming memory and CPU, potentially crashing the system.
- Repeated Requests: An attacker sends a high volume of API requests, depleting resources and denying service to legitimate users.
As LLMs become more widely used by the general public and in commercial environments, proper rate limiting, monitoring, and resource management are essential to ensure the system remains scalable and efficient under heavy usage. To combat unbounded consumption, the following strategies should be implemented:
- Rate Limiting and Throttling: Restrict request rates and apply timeouts for resource-intensive operations to prevent system overload.
- Resource Allocation Management: Dynamically monitor and manage resources to prevent excessive consumption by any single user or request.
Confident AI: The DeepEval LLM Evaluation Platform
The leading platform to evaluate and test LLM applications on the cloud, native to DeepEval.
Got Red? Red team LLM systems today with Confident AI
The leading platform to safety-test LLM applications on the cloud, native to DeepEval.
Risk Mitigation Strategies
Hopefully by now, you’re an expert in LLM safety and security. We’ve gone through all the OWASP Top 10 LLM Risks for 2025, detailing how a malicious attack could exploit an LLM risk, and mitigation strategies for each.
The question remains, however, even though we’ve implemented all these safety mechanisms, how do we know our LLM is safe and secure? The answer is that you’ll need a systematic testing framework. And remember, right at the very beginning, I told you that you need to test in both development and production.
Risk Mitigation in Development
To systematically test your LLM in development for vulnerabilities, you’ll need a comprehensive dataset for each risk. A large dataset is essential if you want to be confident in your LLM’s security. Not only must these datasets cover a wide range of attacks, but the attacks need to be effective — both in breadth and depth.
You’ll need to iterate on your LLM until it passes all the attacks in your evaluation dataset. The reason for this is simple: your dataset likely won’t cover every possible attack, and if your LLM doesn’t pass the ones in your test set, you have to ask: what other vulnerabilities might exist in the real world?
Fortunately for you, for the past year, I've attacking all sorts of LLM applications—agents, RAG systems, simple LLMs—you name it. We've developed a pipeline to help secure your LLM as quickly as possible and even built a platform to support it.
Confident AI lets you automatically generate attacks and enhance them using research-backed strategies, making them more effective. With just one click, you can generate as many attacks as you need. The platform also lets you track progress in one central location, helping you accelerate your testing process. If you want stronger security, simply generate more attacks and pass them in your evaluations.
Risk Mitigation in Production
The first part of risk mitigation in production is monitoring — constant monitoring. You'll need to track every single input that comes in from your users, and evaluate every single response your LLM generates (based on a comprehensive suite of safety metrics you care about). You should be monitoring everything — from how each component functions based on different inputs, to having full visibility across your entire LLM application architecture, in real time. If you're curious, I’ve written a whole separate article on monitoring and observability.
But monitoring alone if not enough. Sure, you'll know what's going on, but you're not actually doing anything to protect your LLM application. That's why the second part of risk mitigation in production is even more important: you'll need guardrails. Guardrails are simple binary metrics you place at the input-level before your user’s' inputs reach your LLM application, and at the output-level before the LLM’s responses reach your users. They ensure that harmful attacks don't reach your application, and that no harmful responses go out to your users. This is vital because, even though your LLM might occasionally fail (no system is 100% perfect), adding these guardrails significantly lowers the risk of failure.
Confident AI also offers a comprehensive monitoring/observability platform and integrated guardrails API (which by the way, is blazing fast 🚀), allowing you to track everything in one place. We also automatically flag any failing responses in production— whether that's security breach or unacceptable responses— so that you can continue testing these failing responses in development and make the necessary improvements.
Conclusion
LLM applications are becoming more and more powerful. While this is exciting, it also comes with risks. We’re huge advocates of LLMs, but we also believe they must act responsibly. The OWASP Top 10 LLM Risks for 2025 is a great starting point to understand the potential risks of LLMs and how to mitigate them.
That said, it’s just a guideline. If you want to truly feel secure, you’ll need a way to systematically test your LLM application both in development and in production. Luckily, you’ve got us. We’ve spent the past year building, testing, and iterating to ensure it’s easy to harden your LLMs and feel confident about their security.
If you have any questions concerning setting-up your red-teaming pipeline, give us a call. We’re here to help!
Do you want to brainstorm how to evaluate your LLM (application)? Ask us anything in our discord. I might give you an “aha!” moment, who knows?
Confident AI: The DeepEval LLM Evaluation Platform
The leading platform to evaluate and test LLM applications on the cloud, native to DeepEval.
Got Red? Red team LLM systems today with Confident AI
The leading platform to safety-test LLM applications on the cloud, native to DeepEval.