Sometime back, OpenAI introduced the capability to train new fine-tuned models based on their GPT-3 API
I have had the opportunity to train a few fine-tuned models of my own and for clients.
This GPT-3 Fine tuning guide covers fine-tuning an OpenAI GPT-3 model in detail. It includes,
- What is GPT-3 fine tuning?
- GPT-3 Fine tuning vs prompting
- Pricing for GPT-3 fine tuning
- GPT-3 Fine tuning key points
- Steps to fine-tune an OpenAI GPT-3 AI model
- Scenarios for GPT-3 fine tuning
Let’s go through them in detail.
Table of Contents
What does fine-tuning a GPT-3 model mean?
OpenAI, by default, gives you a few AI models or engines that are suited for different tasks.
However, sometimes you don’t get the desired output, or getting the output is too expensive.
Fine-tuning gives you the ability to take OpenAI’s base models/engines and train a new model but on a curated dataset that you supply.
Some people also call fine-tuning training GPT-3.
How is GPT-3 fine tuning better than just plain old prompting?
This could be useful in several ways
- It could get you better-quality outputs with no or fewer examples in the prompts.
- You can train your models on hundreds of examples – up to a total of 80-100MB per dataset
- You will save on API usage costs due to fewer or no examples in your prompts.
- You will have lower latency in your API calls.
Note: get this guide in ads-free format on gumroad
GPT-3 Fine tuning pricing
Fine-tuning a model is charged at 50% of the cost of the model you are trying to fine-tune. Below are the current rates for fine-tuning a GPT-3 model.
MODEL | TRAINING | USAGE |
---|---|---|
Ada | $0.0004 / 1K tokens | $0.0016 / 1K tokens |
Babbage | $0.0006 / 1K tokens | $0.0024 / 1K tokens |
Curie | $0.0030 / 1K tokens | $0.0120 / 1K tokens |
Davinci | $0.0300 / 1K tokens | $0.1200 / 1K tokens |
As you can see, just like with model usage, fine-tuning rates also differ based on which model you are trying to fine-tune.
Some key points about GPT-3 fine tuning
The model will not be shared with other API users and will be private to the org/users who fine-tuned it.
However, there might be a possibility of sharing fine-tuned models with other companies, hence creating a de-facto marketplace for fine-tuned models.
Right now, you can fine-tune up to 10 models per month and each dataset can be up to 2.5M tokens or 80-100MB in size.
You can use the fine-tuned model from the OpenAI Playground from the command line using the OpenAI command-line tool, CURL command, or from within your code.
What does a GPT-3 fine tuning training dataset look like
The training dataset has to be in jsonl format where each document is separated by a new line.
A typical dataset jsonl file looks like this.
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
....
GPT-3 Fine tuning Steps
There are three steps involved in fine-tuning GPT-3.
- Prepare the training dataset
- Train a new fine-tuned model
- Use the new fine-tuned model
Let’s cover each of the above steps one by one.
- Prepare the training dataset
As with using the base GPT-3 models, the creativity lies in coming up with creative examples for a practical and novel use case. This is where domain knowledge comes into play.
One example here would be fine-tuning GPT-3 in a foreign language where the base GPT-3 is not very good. One way to do this is to collect high-quality representative text for that language and then prepare the dataset file where the prompt is empty and completion has the text from that language.
This is how the dataset jsonl file would like.
{"prompt": "", "completion": "<ideal generated text>"}
{"prompt": "", "completion": "<ideal generated text>"}
{"prompt": "", "completion": "<ideal generated text>"}
....
You would have to take care to keep each record less than 2048 tokens.
Once, you have the dataset ready, run it through the OpenAI command-line tool to validate it.
openai tools fine_tunes.prepare_data -f <LOCAL_FILE>
You can also pass files in CSV, TSV, XLSX, JSON or JSONL format to this tool and it will help you convert it into a fine-tuning ready dataset.
- Train a new fine-tuned model
Run the below command from the command line program to train your fine-tuned model.
openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> -m <BASE_MODEL>
Replace the filename and choose a model name to base your model on. Current options are curie, babbage, or ada.
Once the fine-tuning finishes, you will see the model id.
- Using the new fine-tuned model.
One way to use your newly fine-tuned model is through a command line
openai api completions.create -m <FINE_TUNED_MODEL> -p <YOUR_PROMPT>
You could also use it in your code, for example in Python
import openai
openai.Completion.create(
model=FINE_TUNED_MODEL,
prompt=YOUR_PROMPT)
This model will also be available in the model list from the OpenAI Playground.
Scenarios for fine-tuning GPT-3
Now you know what fine-tuning an OpenAI GPT-3 AI model means and how to go about it, you might be thinking about what kind of scenarios you can use a fine tune model for.
Here are some use cases for using a fine-tuned GPT-3 mode in
Personalized email generator
Prepare a dataset from the emails you have sent, both initiated yourself and replies using the steps provided in this post earlier. Fine-tune a DaVinci model on this dataset. You now will have a personalized email generator GPT-3 AI model that will follow your style while writing your emails for you. You can then use this model directly in the GPT-3 Playground or integrate it into an email client using code.
A chatbot that talks in the style of someone
Let’s say you wish you could talk to a famous author, like Isaac Asimov or Carl Sagan. Now you can come close to it by fine-tuning a GPT-3 model on the books and articles written by those authors.
There are scores of these kinds of use cases and scenarios where fine-tuning a GPT-3 AI model can be really useful.
Conclusion
That’s it. This is how you fine-tune a new model in GPT-3. Whether to fine-tune a model or go with plain old prompt designing will all depend on your particular use case. Try out a few methods and GPT-3 engines before settling on one that gives you the most high-quality outputs in more scenarios.