LoRA vs QLoRA vs DPO: Which Fine-Tuning Method Should You Use?
A practical guide to choosing between LoRA, QLoRA, and DPO for your fine-tuning project — with VRAM math and dataset size guidelines.
TL;DR
- LoRA: best when you have a 24GB+ GPU and clean instruction data.
- QLoRA: only option for 7B+ models on consumer GPUs (RTX 3090, 4090).
- DPO: when you have preference pairs (chosen vs rejected). Comes AFTER LoRA SFT.
LoRA: the baseline
Inject low-rank matrices into attention projections. Only ~0.1% of params train. Rule of thumb: r=8, alpha=16, target q_proj, v_proj.
VRAM math for Llama-3.1-8B:
- Base model (bf16): 16 GB
- LoRA adapters: ~0.2 GB
- Optimizer states (AdamW): ~0.5 GB
- Activations + KV: ~3-5 GB
- Total: ~22 GB → fits on RTX 3090.
QLoRA: the consumer-GPU unlock
Same as LoRA but the base model is loaded in 4-bit NF4. Same Llama-3.1-8B drops from 16 GB → 4 GB for the base. Fine-tuning fits on 12 GB.
When to use: 13B+ models on a single GPU. When NOT to use: when you can afford the VRAM — LoRA in bf16 trains slightly faster.
DPO: preference alignment without RLHF
DPO is NOT a replacement for SFT. The right order:
- SFT (LoRA or full): teach the model the format/domain.
- DPO: align preferences.
Dataset format: (prompt, chosen, rejected) triples. Min size: 5K pairs.
Decision tree
- Have instruction data only? → LoRA SFT.
- 7B+ model on <24 GB GPU? → QLoRA.
- Have preference data? → DPO after SFT.
- Need state-of-the-art reasoning on math/code? → GRPO (the DeepSeek-R1 approach).
Enjoyed this article?
Join 500+ AI developers getting weekly tips, news and resources from AmanAI Lab.
No spam. Unsubscribe anytime.
Discussion
Sign in to comment →Join the discussion
Sign in with your AmanAI Lab account — it takes 30 seconds.