Tech Bites

What is bitsandbytes and uses

⚡ What is bitsandbytes

bitsandbytes is an open-source library by Tim Dettmers that provides memory-efficient optimizers and quantization techniques for training and using large models (like LLaMA, GPT, etc.).

It is mainly used to:

Reduce GPU memory usage
Speed up training
Load huge models on small GPUs (like 8–16 GB)

🧠 What It Does

bitsandbytes has two main superpowers:

🧮 1. 8-bit and 4-bit Quantization

Normally, model weights are stored as FP16 (16-bit floats) or FP32 (32-bit floats).
bitsandbytes lets you load them in 8-bit or even 4-bit, cutting memory use by 2× to 4×.

Example:

A 13B model in FP16 needs ~26 GB
In 8-bit: ~13 GB
In 4-bit: ~6.5 GB 💡

This is often used with Hugging Face like:


from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-13b-hf",
    load_in_4bit=True,                # <— bitsandbytes magic
    device_map="auto"
)

⚡ 2. Memory-Efficient Optimizers

Provides 8-bit versions of standard optimizers like Adam, AdamW, etc.
Reduces memory usage during training by ~75%
Examples: Adam8bit, PagedAdamW8bit


from bitsandbytes.optim import Adam8bit
optimizer = Adam8bit(model.parameters(), lr=1e-4)

📌 Why It’s Useful

Problem	Solution from bitsandbytes
LLMs don’t fit on GPU	Quantize them to 8-bit or 4-bit
Fine-tuning is too memory-heavy	Use 8-bit optimizers
Need faster training	Lower precision speeds things up
Want to use PEFT/LoRA on small GPUs	Combine LoRA + bitsandbytes

🧩 Common Usage Combo

People often use:

Transformers → to load models
bitsandbytes → to load them in 4-bit
PEFT + LoRA → to fine-tune only small adapters

This trio lets you fine-tune a 13B or even 70B model on a single GPU with as little as 12–24 GB VRAM.

📌 Summary

bitsandbytes is a GPU efficiency library that lets you run and train huge models on small hardware by using 8-bit/4-bit quantization and memory-saving optimizers.

It is one of the key enablers of today’s open-source LLM fine-tuning.

What is PEFT (Parameter-Efficient Fine-Tuning)

⚡ What is PEFT (Parameter-Efficient Fine-Tuning)

PEFT stands for Parameter-Efficient Fine-Tuning.
It is a technique and a library (by Hugging Face) that lets you fine-tune large language models without updating all their parameters, which makes training much faster and cheaper.

Instead of modifying the billions of weights in a model, PEFT methods only add or update a small number of parameters — often less than 1% of the model size.

🧠 Why PEFT is Needed

Full Fine-Tuning	PEFT
Updates all parameters	Updates only a few parameters
Requires huge GPU memory	Needs much less memory
Slow and expensive	Fast and low-cost
Hard to maintain multiple versions	Easy to store/share small adapters

This is crucial when you want to:

Customize big models (like LLaMA, Falcon, GPT-style models)
Use small GPUs (even a single 8–16 GB GPU)
Train multiple domain-specific variants

⚙️ Types of PEFT Methods

The PEFT library by Hugging Face implements several techniques:

Method	Description
LoRA (Low-Rank Adaptation)	Adds small trainable low-rank matrices to attention layers
Prefix-Tuning	Adds trainable "prefix" vectors to the input of each layer
Prompt-Tuning / P-Tuning	Adds trainable virtual tokens (soft prompts) to the model input
Adapters	Adds small trainable feed-forward layers between existing layers
IA³ (Intrinsic Adaptation)	Scales certain layer activations with learnable vectors

💡 LoRA is the most commonly used PEFT method and works great for LLMs like LLaMA, Mistral, etc.

🧪 Example Usage (Hugging Face PEFT library)


from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

# Load base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

# Configure LoRA (a PEFT method)
config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj","v_proj"], # only add LoRA to these layers
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

# Apply PEFT
model = get_peft_model(model, config)

This trains only a few million LoRA parameters instead of billions.

📌 Summary

PEFT is a set of methods (and a Hugging Face library) that make fine-tuning large models possible on small hardware by updating only a tiny fraction of their parameters.
It’s the standard approach today for customizing LLMs efficiently.

What is the Transformers library

🤖 What is the Transformers library

Transformers is an open-source Python library by Hugging Face that provides:

Pre-trained transformer models
Easy APIs to load, train, and use them
Support for tasks like text, vision, audio, and multi-modal AI

It is the most widely used library for working with LLMs (Large Language Models).

⚙️ What it Contains

Here’s what the transformers library gives you:

🧠 Pre-trained models

1000+ ready-to-use models like:
- GPT, BERT, RoBERTa, T5, LLaMA, Falcon, Mistral, BLOOM, etc.
Downloaded automatically from the Hugging Face Hub

⚒️ Model classes

AutoModel, AutoModelForCausalLM, AutoModelForSeq2SeqLM, etc.
These automatically select the right architecture class for a model

📄 Tokenizers

Converts text ↔ tokens (numbers) for the model
Very fast (often implemented in Rust)

📦 Pipelines

High-level API to run tasks quickly, for example:


from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
print(generator("Once upon a time"))

🏋️ Training utilities

Trainer and TrainingArguments for fine-tuning
Works with PyTorch, TensorFlow, and JAX

📊 Supported Tasks

Task	Example
Text Generation	Chatbots, storytelling
Text Classification	Spam detection, sentiment
Question Answering	QA bots
Translation	English → French
Summarization	Summarizing articles
Token Classification	Named entity recognition
Vision/Multimodal	Image captioning, VQA

💡 Why It’s Popular

Huge model zoo (open weights)
Unified interface across models
Active community and documentation
Compatible with Hugging Face ecosystem: Datasets, Accelerate, PEFT (LoRA)

📌 Summary

transformers is the go-to library for using and fine-tuning state-of-the-art AI models — especially large language models — with just a few lines of code.

Tech Bites

Saturday, September 13, 2025

What is bitsandbytes and uses

⚡ What is bitsandbytes

🧠 What It Does

🧮 1. 8-bit and 4-bit Quantization

⚡ 2. Memory-Efficient Optimizers

📌 Why It’s Useful

🧩 Common Usage Combo

📌 Summary

What is PEFT (Parameter-Efficient Fine-Tuning)

⚡ What is PEFT (Parameter-Efficient Fine-Tuning)

🧠 Why PEFT is Needed

⚙️ Types of PEFT Methods

🧪 Example Usage (Hugging Face PEFT library)

📌 Summary

What is the Transformers library

🤖 What is the Transformers library

⚙️ What it Contains

🧠 Pre-trained models

⚒️ Model classes

📄 Tokenizers

📦 Pipelines

🏋️ Training utilities

📊 Supported Tasks

💡 Why It’s Popular

📌 Summary

What is the TRL library

Search This Blog