Fine-Tuning6 min read19 May 2026

LoRA vs QLoRA vs DPO: Which Fine-Tuning Method Should You Use?

A practical guide to choosing between LoRA, QLoRA, and DPO for your fine-tuning project — with VRAM math and dataset size guidelines.

LoRAQLoRADPOFine-Tuning

TL;DR

LoRA: best when you have a 24GB+ GPU and clean instruction data.
QLoRA: only option for 7B+ models on consumer GPUs (RTX 3090, 4090).
DPO: when you have preference pairs (chosen vs rejected). Comes AFTER LoRA SFT.

Inject low-rank matrices into attention projections. Only ~0.1% of params train. Rule of thumb: r=8, alpha=16, target q_proj, v_proj.

VRAM math for Llama-3.1-8B:

Same as LoRA but the base model is loaded in 4-bit NF4. Same Llama-3.1-8B drops from 16 GB → 4 GB for the base. Fine-tuning fits on 12 GB.

When to use: 13B+ models on a single GPU. When NOT to use: when you can afford the VRAM — LoRA in bf16 trains slightly faster.

DPO is NOT a replacement for SFT. The right order:

Dataset format: (prompt, chosen, rejected) triples. Min size: 5K pairs.

Have instruction data only? → LoRA SFT.
7B+ model on <24 GB GPU? → QLoRA.
Have preference data? → DPO after SFT.
Need state-of-the-art reasoning on math/code? → GRPO (the DeepSeek-R1 approach).

Join 500+ AI developers getting weekly tips, news and resources from AmanAI Lab.

No spam. Unsubscribe anytime.

Loading comments…

Join the discussion