The native DeepEval platform

Made by the creators of DeepEval, Confident AI is designed to scale your DeepEval AI testing workflows organization-wide with observability and collaboration.

TRUSTED BY 500+ LEADING AI COMPANIES
Panasonic logo
Toshiba logo
Samsung logo
Phreesia logo
BCG logo
Epic Games logo
Humach logo
Finom logo
Amdocs logo
ByteDance logo
Evals ran to date[ 0+ ]
COLLABORATION

Bye bye CSVs. Hello collaboration.

Shared evaluation dashboards

Shared evaluation dashboards

Every DeepEval test run is automatically synced to a shared dashboard. No more exporting CSVs or pasting results in Slack — your whole team sees the same metrics in real time.

Comment & annotate results

Comment & annotate results

Leave comments on individual test cases and evaluation runs. Tag teammates, flag regressions, and resolve issues without switching tools.

Version datasets

Version datasets

Every dataset change is tracked with full version history. Roll back bad edits, compare test case coverage across versions, and know exactly what changed between evaluation runs.

Align metrics with humans

Align metrics with humans

Compare metric scores against human annotations to surface false positives and negatives. Know exactly where your evals agree with your team — and where they don't.

Regression testing

Regression testing

Catch quality drops before they ship. Automatically compare new runs against your last known-good baseline and surface the exact test cases that regressed.

ENTERPRISE

Built for teams that can't afford to get it wrong.

HIPAA, SOCII COMPLIANT
Our compliance standards meets the requirements of even the most regulated healthcare, insurance, and financial industries.
MULTI-DATA RESIDENCY
Store and process data in the United States of America (North Carolina) or the European Union (Frankfurt).
RBAC AND DATA MASKING
Our flexible infrastructure allows data separation between projects, custom permissions control, and masking for LLM traces.
99.9% UPTIME SLA
We offer enterprise-level guarantees for our services to ensure mission critical workflows are always accessible.
ON-PREM HOSTING
Optionally deploy Confident AI in your cloud premises, may it be AWS, Azure, or GCP, with tailored hands-on support.
INTEGRATION

Stay In Your Stack.
We'll Meet You There.

SDKs in Python, Typescript; 20+ integrations, including OpenAI, LangGraph, Opentelemetry, and tons of more LLM gateways.

pip install deepeval
OpenAI AgentsLlamaIndexLangGraphPydantic AICrew AIOpenTelemetryOpenAILangChainVercel AI SDKAgent CoreLiteLLMPortkeyspan_01trace_01trace_02span_02span_03span_04Prompt Leakage6%Goal Theft7%PII Leakage4%Excessive Agency3%Misinformation5%Bias2%OpenAI AgentsLlamaIndexLangGraphPydantic AICrew AIOpenTelemetryOpenAILangChainVercel AI SDKAgent CoreLiteLLMPortkeyspan_01trace_01trace_02span_02span_03span_04Prompt Leakage6%Goal Theft7%PII Leakage4%Excessive Agency3%Misinformation5%Bias2%
COMMUNITY

The Future of Quality AI Depends on You.

Join the largest and fastest growing community on AI evaluation.

FAQ

Have a Question?

Checkout our FAQs below, or talk to a human. They won't hallucinate.

DeepEval is an open-source evaluation framework that lets you write and run LLM evaluation tests locally in Python. Confident AI is the cloud platform built on top of DeepEval that adds centralized test management, observability, collaboration, and analytics so teams can scale their evaluation workflows organization-wide.
Yes. The team behind Confident AI created and maintains DeepEval. DeepEval was open-sourced to give the community a best-in-class LLM evaluation framework, while Confident AI extends it with the enterprise features teams need to operationalize evaluations at scale.
No. Confident AI is a standalone platform whose APIs are integrated into DeepEval. However, Confident AI is also a full LLM observability platform, so you can use it to trace, monitor, and evaluate your LLM applications in one place — no more siloing evals and tracing across different tools.
Almost certainly. Observability is trivial, evals are the real challenge. Confident AI's observability is one of the best solutions for quality-driven monitoring of AI apps, and one of the cheapest on the market. [See pricing ]
Yes. DeepEval is fully open-source under the Apache 2.0 license and free to use for any purpose. Confident AI offers a free tier as well, along with paid plans for teams that need advanced features like role-based access, custom dashboards, and dedicated support.
DeepEval ships with 50+ research-backed metrics including faithfulness, answer relevancy, contextual recall, contextual precision, hallucination, bias, toxicity, and more. You can also define fully custom metrics using Python or LLM-as-a-judge approaches.
Yes. While Confident AI has first-class support for DeepEval, it also integrates with other popular tools and frameworks through its REST API and SDKs, so you can centralize results regardless of how you run your evaluations.
Install DeepEval with pip install deepeval, write your first evaluation test, and optionally connect to Confident AI by running deepeval login. You can also sign up for Confident AI directly and start using the platform without DeepEval.