Run 100B+ Language Models at Home with Petals

Run 100B+ Language Models at Home with Petals

Large language models like GPT-3 are huge and require a large of GPUs to run on. But what if we could all run 100+B Language models right on our computers?

Petals is here to help! With its BitTorrent-style approach, you can now run language models with over 100 billion parameters at home. Petals allow you to generate text using distributed BLOOM and fine-tune it for your own tasks. In addition, Petals makes fine-tuning and inference up to 10 times faster than offloading.

How Petals Works

Petals works by running large language models like BLOOM-176B collaboratively. You only need to load a small part of the model, and then team up with others to run inference or fine-tuning. Inference runs at approximately 1 second per step, making it perfect for chatbots and other interactive apps. You can also employ any fine-tuning and sampling methods by executing custom paths through the model or accessing its hidden states.

Here’s an example of how to use Petals to generate text and fine-tune a model:

from petals import DistributedBloomForCausalLM

model = DistributedBloomForCausalLM.from_pretrained("bigscience/bloom-petals", tuning_mode="ptune", pre_seq_len=16)
# Embeddings & prompts are on your device, BLOOM blocks are distributed across the Internet

inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0])) # A cat sat on a mat...

# Fine-tuning (updates only prompts or adapters hosted locally)
optimizer = torch.optim.AdamW(model.parameters())
for input_ids, labels in data_loader:
outputs = model.forward(input_ids)
loss = cross_entropy(outputs.logits, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()

Increase Petals Capacity with Your Own GPU

To increase Petals’ capacity, you can easily connect your own GPU. Simply install PyTorch and Petals, and then run the server. Or, use our GPU-enabled Docker image to run the server.

# In an Anaconda env
conda install pytorch cudatoolkit=11.3 -c pytorch
pip install -U petals
python -m petals.cli.run_server bigscience/bloom-petals

# Or using our GPU-enabled Docker image
sudo docker run --net host --ipc host --gpus all --volume petals-cache:/cache --rm \
learningathome/petals:main python -m petals.cli.run_server bigscience/bloom-petals

More Examples and Tutorials

Petals has plenty of examples and tutorials available, including a chatbot web app and a tutorial on launching your own swarm.

Privacy and Security

Please note that the Petals public swarm is designed for research and academic use only. Do not use the public swarm to process sensitive data, as it is technically possible for peers serving model layers to recover input data and model outputs or modify them in a malicious way. Instead, you can set up a private Petals swarm with trusted individuals and organizations.

Additionally, be sure to check out the model’s terms of use, risks, and limitations before building an application that runs a language model with Petals.