API Documentation
Everything you need to integrate EvalKit into your application.
Quickstart
Get up and running with EvalKit in three steps.
Step 1: Create an account
Sign up at Sign Up and get your API key from the dashboard. Your API key will look like ek_live_abc123...
Step 2: Make your first eval
Send a POST request to the eval endpoint with your LLM output and the criteria you want to evaluate against.
curl -X POST https://evalkit.dev/api/v1/eval \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"output": "The mitochondria is the powerhouse of the cell.",
"input": "What is the mitochondria?",
"criteria": ["accuracy", "relevance", "completeness"]
}'Step 3: Check your results
The API returns a structured response with an overall score, per-criteria breakdowns, any detected issues, and actionable suggestions.
{
"id": "eval_abc123",
"overall_score": 0.95,
"criteria": {
"accuracy": { "score": 0.97, "reasoning": "Factually correct statement" },
"relevance": { "score": 0.94, "reasoning": "Directly answers the question" },
"completeness": { "score": 0.93, "reasoning": "Covers the main function" }
},
"issues": [],
"suggestions": [
"Consider expanding on the role of mitochondria in ATP synthesis."
],
"tokens_used": 142,
"latency_ms": 830
}overall_score — Weighted average across all criteria (0 to 1).
criteria — Individual score for each criterion you requested.
issues — Array of problems detected in the output.
suggestions — Actionable recommendations to improve the output.
tokens_used — Number of tokens consumed by the evaluation.
latency_ms — Time taken to process the evaluation in milliseconds.
API Reference
Complete reference for all EvalKit API endpoints.
POST/v1/eval
Run a single evaluation against one or more criteria.
Headers
| Name | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Yes | Bearer token. Format: "Bearer YOUR_API_KEY". |
| Content-Type | string | Yes | Must be "application/json". |
Request Body
| Name | Type | Required | Description |
|---|---|---|---|
| output | string | Yes | The LLM-generated text to evaluate. |
| input | string | No | The original prompt or query that produced the output. |
| context | string | No | Reference material or grounding context for the evaluation. |
| criteria | string[] | object[] | Yes | Array of built-in criteria names or custom criteria objects. |
| model | "fast" | "thorough" | No | Evaluation model to use. "fast" is cheaper and quicker; "thorough" is more detailed. Defaults to "fast". |
{
"output": "The mitochondria is the powerhouse of the cell.",
"input": "What is the mitochondria?",
"context": "Biology textbook, Chapter 4: Cell Structure",
"criteria": ["accuracy", "relevance", "completeness"],
"model": "thorough"
}Response
| Name | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | Unique evaluation ID. |
| overall_score | number | Yes | Weighted average score across all criteria (0 to 1). |
| criteria | object | Yes | Per-criterion scores as key-value pairs. |
| issues | string[] | Yes | List of problems detected in the output. |
| suggestions | string[] | Yes | Actionable recommendations to improve the output. |
| tokens_used | number | Yes | Number of tokens consumed by the evaluation. |
| latency_ms | number | Yes | Processing time in milliseconds. |
{
"id": "eval_abc123",
"overall_score": 0.95,
"criteria": {
"accuracy": 0.97,
"relevance": 0.94,
"completeness": 0.93
},
"issues": [],
"suggestions": [
"Consider expanding on the role of mitochondria in ATP synthesis."
],
"tokens_used": 142,
"latency_ms": 830
}POST/v1/eval/batch
Evaluate multiple outputs in a single request. Maximum of 20 evaluations per batch.
Request Body
| Name | Type | Required | Description |
|---|---|---|---|
| evaluations | object[] | Yes | Array of evaluation objects (max 20). Each object has the same shape as a single eval request. |
| model | "fast" | "thorough" | No | Evaluation model applied to all items. Defaults to "fast". |
{
"evaluations": [
{
"output": "The mitochondria is the powerhouse of the cell.",
"input": "What is the mitochondria?",
"criteria": ["accuracy", "relevance"]
},
{
"output": "Water boils at 100 degrees Celsius at sea level.",
"input": "At what temperature does water boil?",
"criteria": ["accuracy", "completeness"]
}
],
"model": "fast"
}Response
Returns an object with a results array. Each element has the same shape as a single eval response.
{
"results": [
{
"id": "eval_abc123",
"overall_score": 0.95,
"criteria": { "accuracy": 0.97, "relevance": 0.94 },
"issues": [],
"suggestions": [],
"tokens_used": 98,
"latency_ms": 620
},
{
"id": "eval_def456",
"overall_score": 0.88,
"criteria": { "accuracy": 0.98, "completeness": 0.78 },
"issues": [],
"suggestions": [
"Mention that boiling point varies with altitude and pressure."
],
"tokens_used": 105,
"latency_ms": 710
}
]
}GET/v1/criteria
List all available built-in evaluation criteria. No authentication required.
{
"criteria": [
{ "name": "accuracy", "description": "Output is factually correct and free of errors" },
{ "name": "relevance", "description": "Output addresses the input query directly" },
{ "name": "coherence", "description": "Output is logically structured and easy to follow" },
{ "name": "safety", "description": "Output is free of harmful, biased, or inappropriate content" },
{ "name": "tone", "description": "Output matches the expected tone and register" },
{ "name": "completeness", "description": "Output covers all aspects of the input query" },
{ "name": "conciseness", "description": "Output is free of unnecessary filler or repetition" },
{ "name": "groundedness", "description": "Output is supported by the provided context" }
]
}Built-in Criteria
| Name | Description |
|---|---|
| accuracy | Output is factually correct and free of errors |
| relevance | Output addresses the input query directly |
| coherence | Output is logically structured and easy to follow |
| safety | Output is free of harmful, biased, or inappropriate content |
| tone | Output matches the expected tone and register |
| completeness | Output covers all aspects of the input query |
| conciseness | Output is free of unnecessary filler or repetition |
| groundedness | Output is supported by the provided context |
Custom Criteria
You can define custom criteria by passing an object with a name and description in the criteria array. You can mix built-in and custom criteria in the same request.
{
"output": "Thanks for reaching out! We would love to help.",
"criteria": [
"accuracy",
{
"name": "brand_voice",
"description": "Output should match a professional, friendly tone"
}
]
}Error Codes
| Code | Meaning | Description |
|---|---|---|
| 400 | Bad Request | Invalid or missing required fields in the request body. |
| 401 | Unauthorized | Missing or invalid API key. |
| 500 | Internal Server Error | The evaluation failed due to an internal error. |