Synthetic Data Generation
Generate synthetic data for fine-tuning or evaluation
Kiln offers a powerful interactive synthetic data generation tool.
Video Walkthrough
The UI has been improved since this video was recorded
Use Cases
Synthetic data is helpful for many reasons:
Evals: Generate data for custom evals of your task performance
Fine-tuning: Generate fine-tuning datasets
Built-in Templates: Using our built in data-gen templates like 'Jailbreaking' or 'Bias' to check your system for common issues (curated evals)
Addressing Bugs / Issues: generate targeted data to reproduce a bug/issue, which can be used for training a fix, evaluating a fix, and backtesting
Prompting: Generate examples to be used for few-shot or multi-shot prompting
How It Works
Automatic Synthetic Data Gen Prompting
Kiln doesn't require you to write complex custom synthetic data gen prompts. Since you've already defined a goal when setting up your task, Kiln can do this for you. It will infer the type of data needed from the system prompt, adapt it to your data-gen goal, and create synthetic data gen prompts without any manual prompting.
Walkthrough
Choose A Goal To Focus Data Gen
First select a goal for your dataset generation: Evals for Fine-Tuning. This is an important step as you need different data for different goals:
Fine-Tuning: generate high quality outputs across a broad range of possible inputs, to help your model learn how to respond to a range of requests. This can include generating inputs that commonly produce issues, and outputs that avoid that issue.
Evals: Intentionally generate a mix of good and bad inputs and outputs. We'll use the bad outputs to ensure the judge model can properly assess failures, and we'll use the bad inputs to ensure your task no longer has the issue.

Selecting the goal will setup two:
Template: A Kiln prompt template to guide the data gen. You can edit this template before running data gen.
Tag Assignments: which dataset tags will be assigned to generated data. This could be a single tag like
fine_tuning_data
or a randomly assigned split likeeval_data: 80%, golden_data: 20%
.

Choose A Data Gen Model
We highly recommend choosing a large capable model for data gen. While your task may work on smaller models, data gen is more complex. It requires reasoning about a range of possible inputs, probing edge cases, and more. It benefits from a large model with a long context.

If generating content to evaluate how your model responds to inappropriate requests (bias, jailbreaking, maliciousness, etc.), choose an uncensored model like Grok. Censored models like GPT 4o will refuse to generate some types of content.
Interactive Curation UX
Kiln synthetic data generation is designed to be interactive! As you work, be critical of the generated data and use the interactive UI to make great quality data. You can delete topics or examples that don't match your goals, add custom topics manually, update prompts to guide content, and iterate until you're happy with the results.
3 Levels of Data Gen: Topics, Inputs, Outputs
Kiln generates synthetic data in 3 stages:
Topics: generate a tree of topics, which allows for breadth
Model Inputs: generate synthetic model inputs (the user message). Optionally targeting a specific topic. Within each topic, we aim for a range of relevant inputs which are not too similar to each other.
Model Outputs: generate synthetic model outputs from one of the inputs.
Topic-Tree Data Generation For Content Breadth
A common issue with synthetic data generation is that if you ask a model to generate synthetic data 100 times, you get 100 similar outputs. Kiln topics fixes this by guiding the task to generate a breadth of examples across different topics.
Kiln can generate a topic tree and generate examples for each node. This includes nested topics, which allows you to generate a lot of broad data very quickly.
You can use automatic topic generation, or manually add topics to your topic tree.

Model Inputs
Once you have a topic, you can generate model inputs:

Model Outputs
When you have generated all of the inputs you want, click "Save All Model Outputs" to generate outputs. These won't appear in this UI, but will appear in your dataset with the appropriate tags.

Automatic Templates and Custom Prompting
When you select a goal, we'll select the corresponding prompt template. These are built into Kiln and help guide data generation for a variety of tasks.

When creating synthetic data for an eval there are two additional powerful options:
Kiln Issue Template: When you create an eval using the "Issue" template, data gen will create a template that can help find passing and failing examples for evaluating and resolving the issue.
Kiln Requirements Template: Generate synthetic data to evaluate overall task rating, and any requirements you added to your task.
These templates are a starting point. They may work for you out of the box, or you may want to edit them to get the desired data. You can edit a template before you run data gen to ensure it matches your needs.
Some examples of custom prompts/edit:
Generate content for global topics, not only US-centric
Generate examples in Spanish
The model is having trouble classifying sentiment of sarcastic messages. Generate sarcastic messages.

Structured Data (JSON, tool calling)
If your task requires structured input and/or output, your synthetic data generation will automatically follow the schemas you defined. All values are validated against the schemas you define, and nothing will be saved into your dataset if they don't comply.
You can define the schema in our task definition UI for a visual schema builder. Alternatively you can directly set a JSON Schema in the task via our python library or a text editor.
Under the hood we attempt to use tool calling when the model supports it, but will fallback to JSON parsing if not.
Resolving Bugs & Issues with Synthetic Data
Synthetic data is a great tool for resolving bugs and issues in AI systems.
Follow these steps:
Create an eval using the Issue template, which will ask you to describe the issue and optionally provide examples.
Use synthetic data generation, selecting that issue as the goal. Generate data that reproduces that issue, then return to evals and verify you now have an eval Judge + eval dataset that can detect the issue reliably.
Iterate on different approaches to solving the problem (adjust prompt, model, temperature), using your eval judge to check if the solution works
Optionally use synthetic data and fine-tuning to fix the issue (for difficult issues where prompting doesn't work). Select the issue as the goal, but modify the tag assignment to
fine_tuning_data
, generate problematic inputs with successful outputs, adding these pairs to your fine tuning dataset.
Tagging
All synthetic data will be assigned a series of tags:
The tag
synthetic
(manual and imported runs have their own tags)A unique tag to identify the date session (e.g.
synthetic_session_12345
)Custom tags. These are setup automatically when you select a goal, but you can edit before generating data:

Last updated