Fine Tuning for Tool Use
Build fine-tuned models for calling a set of tools, like MCP
Kiln can fine-tune a model for calling a specific set of tools. The fine-tuned models can improve over the base model by:
Learning when to call each tool, and when not to
Learning when to choose one tool over another
Learning how to format tool calls and tool call parameters. This greatly reduces error rates over the base model, making smaller and faster models viable.
Together, this means you can improve agent performance and lower costs.
Building a Fine-Tuning Training Dataset for Tool Calling
To create a fine-tune targeting tool calling, you must generate a training set specifically for tool calling.
The tool set available during training data generation must exactly match the tool set your fine-tune targets.
Kiln disallows training on samples that don't have a matching toolset. We don't want to train on these as the fine-tuned model would improperly learn not to call a tool, even when a tool call would have been appropriate.
This doesn’t mean every tool needs to be called in every training sample. Only that every tool was available to be called.
Kiln makes building a tool-calling training dataset easy:
Open the Fine-Tune tab.
Click “Create Fine Tune”.
Select the set of tools that the model should learn to call.

Selecting tools available to the fine-tuned model Click “Add Fine-Tuning Data” to launch Kiln's synthetic data generation tool.

Generate synthetic training data using Kiln's synthetic data gen tool. It will automatically select the correct tools for you when generating sample outputs.
Distilling Larger Models and Longer Prompts for Better Tool Calling
Beyond learning tool-call formatting, your fine-tuned model must learn when to call each tool and which parameters to pass. The quality of the resulting model depends heavily on the quality of the training dataset—so how do you build a high-quality dataset for tool calling?
The answer is typically distillation: training a model on the outputs of another model. By using larger models with carefully designed prompts that specify how tools should be used, you can generate a high-quality dataset that demonstrates correct tool usage. You can then fine-tune a smaller, faster, and cheaper model to reproduce similar quality.
Model
Large Model
Smaller Model
Prompt
Long prompt detailing tool calling strategy. This includes when each tool should be used and which parameters to pass.
Short prompt focused on task. Does not need to address tool-calling strategy in detail.
Reasoning Mode
Recommended to Enable
Optional
Cost per Token
Expensive
Cheap
Speed
Slower
Faster
Origin of Tool Calling Logic
Base model + detailed strategy in prompt
Learned during fine-tuning
Create a Tool Calling Fine-Tune
Once you’ve created a training set, return to the “Create a Fine Tune” screen to start training. Select a base model, the tools you want, and the dataset you’ve created to start training. See our fine-tuning guide for additional details.

You must select a base model that supports tool calling. Kiln will disable tool calling training if the selected base model was not trained for tool calling.
Kiln will convert your training data into the base model's tool calling format automatically.
Running a Tool Calling Fine-Tune
When running a Tool Calling Fine-Tune in Kiln, we’ll automatically populate the same set of tools it was trained on.
Adding or removing tools will show a warning, as this model is unlikely to perform well with tools that were not in its training dataset.
If deploying a fine-tune created in Kiln, always provide the same tools as it was trained to use.
Evaluating Tool Use
Kiln has a custom eval for evaluating appropriate tool use. We suggest using it to compare the base model to your fine-tune, to confirm performance is improving.
Last updated