Saturday, September 13, 2025

What is LoRA (Low-Rank Adaptation)

 


LoRA is a parameter-efficient fine-tuning technique used to adapt large language models (LLMs) like LLaMA, GPT, etc., to new tasks without retraining the entire model.

Instead of updating all the billions of parameters, LoRA:

  • Freezes the original model weights (keeps them unchanged)

  • Inserts small trainable low-rank matrices into certain layers (usually attention layers)

  • Only trains these small matrices, which are much smaller than the full model


⚙️ How LoRA Works (Simplified)

Imagine an LLM has a large weight matrix W (like 4096×4096).

Normally, fine-tuning means updating all entries in W → which is huge.

With LoRA:

  1. Keep W frozen.

  2. Add two small matrices:

    • A (size 4096×r)

    • B (size r×4096) — where r is small (like 8 or 16)

  3. Train only A and B.

  4. At inference time, the effective weight becomes:

    W' = W + A × B

This drastically reduces the number of trainable parameters.


📊 Why LoRA is Useful

AspectFull Fine-TuneLoRA Fine-Tune
Parameters updatedAll (billions)Few million (<<1%)
GPU memory needVery highVery low
Training speedSlowFast
SharingMust share full modelJust share small LoRA weights

This makes LoRA ideal when:

  • You want to customize a big model on a small dataset

  • You have limited GPU resources

  • You want to train multiple variants of the same base model


📦 Common Uses

  • Domain-specific tuning (medical, legal, finance text)

  • Instruction tuning or chat-like behavior

  • Personalizing models for specific companies or users

  • Combining with PEFT (Parameter-Efficient Fine-Tuning) frameworks like:

    • 🤗 Hugging Face PEFT

    • 🤖 bitsandbytes

    • 🦙 LLaMA + LoRA (common combo)


📝 Summary

LoRA = a lightweight way to fine-tune large models by training only tiny "adapter" layers (low-rank matrices) while keeping original weights frozen.
It dramatically reduces cost, time, and storage needs for customizing LLMs.

No comments:

What is the TRL library

  ⚡ What is the TRL library trl stands for Transformers Reinforcement Learning . It is an open-source library by Hugging Face that lets ...