> For the complete documentation index, see [llms.txt](https://docs.kiln.tech/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.kiln.tech/docs/evals-and-specs/specifications.md).

# Eval Builder

### Demo & Quick Start

{% embed url="<https://vimeo.com/1161246105>" %}

{% hint style="info" %}
**Note:** The Eval Builder requires a Kiln Pro account. Registration is free and easy inside the Kiln app.
{% endhint %}

### What is the Kiln Eval Builder?

<figure><img src="/files/8IWCv4IiKmYo1qwIDTY9" alt="" width="375"><figcaption></figcaption></figure>

The Kiln Eval Builder combines Kiln’s best features into one interactive tool: evals, synthetic data generation, automatic judge prompt creation, and edge case detection. Together they go beyond making an eval manually in several ways:

* **Identify Gaps with AI**: Kiln will read your judge prompt and help refine it. We detect underspecified aspects of your judge, conflicts with your task definition, ambiguous aspects that Judges may struggle with, and other common issues. It then works with you to close gaps and refine conflicts.
* **Interactive Human Alignment & Accuracy**: Building a LLM-as-Judge as good as a human isn’t easy. Human judges make subtle and subjective decisions, and have a hard time articulating their judgement process in a way LLMs can duplicate. Our alignment loop finds tough edge cases, compares LLM judge to human preference, and works with you iteratively until your judge is aligned to your preference.
* **Automatic Synthetic Data**: Build robust synthetic dataset generator as you work. By the time you save your eval you’ll have large and accurate datasets for evals and training.
* **Judge Meta-prompting**: Humans often struggle at writing effective eval judge prompts. Our judge meta-prompting takes your issues and concerns in human terms, and turns them into accurate and judge-able evals.
* **Easy to Use**: Subject matter experts can easily create accurate evals, without a lengthy iteration loop with data scientists. The Eval Builder will walk you through all the steps of defining your judge, creating synthetic data, golden dataset, aligning your judge, and creating training datasets. You get the same rigorous process, without managing each step.
* **Fast**: creating an eval can be done in as little as 5 minutes, compared to over 30 minutes manually.

### How to Get Started

Getting started is easy:

* Open the Kiln App to any task
* Click "Evals" in the sidebar
* Click "Create Eval"
* Connect Kiln Pro account (if you haven’t already)
* Follow the interactive steps until complete!

See the video above for a complete walkthrough.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.kiln.tech/docs/evals-and-specs/specifications.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.