微调配置与显存要求对参考
模型大小 配置类型 显存需求 推荐GPU硬件
7B Freeze (FP16) 20GB RTX 4090
LoRA (FP16) 16GB RTX 4090
QLoRA (INT8) 10GB RTX 4080
QLoRA (INT4) 6GB RTX 3060
13B Freeze (FP16) 40GB RTX 4090 / A100 (40GB)
LoRA (FP16) 32GB A100 (40GB)
QLoRA (INT8) 20GB L40 (48GB)
QLoRA (INT4) 12GB RTX 4090
30B Freeze (FP16) 80GB A100 (80GB)
LoRA (FP16) 64GB A100 (80GB)
QLoRA (INT8) 40GB L40 (48GB)
QLoRA (INT4) 24GB RTX 4090
70B Freeze (FP16) 200GB H100 (80GB) * 3
LoRA (FP16) 160GB H100 (80GB) * 2
QLoRA (INT8) 80GB H100 (80GB) * 2
QLoRA (INT4) 48GB L40 (48GB)
110B Freeze (FP16) 360GB H100 (80GB) * 5
LoRA (FP16) 240GB H100 (80GB) * 3
QLoRA (INT8) 140GB H100 (80GB) * 2
175B Freeze (FP16) 500GB H100 (80GB) * 6
LoRA (FP16) 400GB H100 (80GB) * 5
QLoRA (INT8) 250GB H100 (80GB) * 4
QLoRA (INT4) 150GB H100 (80GB) * 3
300B Freeze (FP16) 800GB A100 / H100 (80GB) * 10
LoRA (FP16) 600GB A100 / H100 (80GB) * 8
QLoRA (INT8) 400GB A100 / H100 (80GB) * 6
QLoRA (INT4) 250GB A100 / H100 (80GB) * 5
671B Freeze (FP16) 1.5TB H100 (80GB) * 20
LoRA (FP16) 1.2TB H100 (80GB) * 16
QLoRA (INT8) 800GB H100 (80GB) * 12
QLoRA (INT4) 500GB H100 (80GB) * 8
基础运行所需要的显存,假设一个请求
Model 4k Tokens 8k Tokens 32k Tokens 128k Tokens
7B 17.6 GB 19.8 GB 33.0 GB 85.8 GB
13B 32.12 GB 35.64 GB 56.76 GB 141.24 GB
30B 72.05 GB 78.14 GB 114.47 GB 259.74 GB
66B 155.58 GB 165.98 GB 228.23 GB 478 GB
70B 165.55 GB 177.07 GB 244.11 GB 523.25 GB
175B 405.77 GB 426.53 GB 551.03 GB 1049.58 GB
并发10个请求,tokens数和所需显存
Model 4k Tokens 8k Tokens 32k Tokens 128k Tokens
7B 37.4 GB 59.4 GB 191.4 GB 719.4 GB
13B 63.8 GB 99.0 GB 303.6 GB 1,128.6 GB
30B 126.5 GB 181.5 GB 528.0 GB 1,914.0 GB
66B 244.2 GB 343.2 GB 937.2 GB 3,313.2 GB
70B 264.0 GB 374.0 GB 1,034.0 GB 3,674.0 GB
175B 583.0 GB 781.0 GB 1,969.0 GB 6,721.0 GB

参考文章:
nvidia官方测试 https://developer.nvidia.com/deep-learning-performance-training-inference/ai-inference