微调配置与显存要求对参考
模型大小 |
配置类型 |
显存需求 |
推荐GPU硬件 |
7B |
Freeze (FP16) |
20GB |
RTX 4090 |
|
LoRA (FP16) |
16GB |
RTX 4090 |
|
QLoRA (INT8) |
10GB |
RTX 4080 |
|
QLoRA (INT4) |
6GB |
RTX 3060 |
13B |
Freeze (FP16) |
40GB |
RTX 4090 / A100 (40GB) |
|
LoRA (FP16) |
32GB |
A100 (40GB) |
|
QLoRA (INT8) |
20GB |
L40 (48GB) |
|
QLoRA (INT4) |
12GB |
RTX 4090 |
30B |
Freeze (FP16) |
80GB |
A100 (80GB) |
|
LoRA (FP16) |
64GB |
A100 (80GB) |
|
QLoRA (INT8) |
40GB |
L40 (48GB) |
|
QLoRA (INT4) |
24GB |
RTX 4090 |
70B |
Freeze (FP16) |
200GB |
H100 (80GB) * 3 |
|
LoRA (FP16) |
160GB |
H100 (80GB) * 2 |
|
QLoRA (INT8) |
80GB |
H100 (80GB) * 2 |
|
QLoRA (INT4) |
48GB |
L40 (48GB) |
110B |
Freeze (FP16) |
360GB |
H100 (80GB) * 5 |
|
LoRA (FP16) |
240GB |
H100 (80GB) * 3 |
|
QLoRA (INT8) |
140GB |
H100 (80GB) * 2 |
175B |
Freeze (FP16) |
500GB |
H100 (80GB) * 6 |
|
LoRA (FP16) |
400GB |
H100 (80GB) * 5 |
|
QLoRA (INT8) |
250GB |
H100 (80GB) * 4 |
|
QLoRA (INT4) |
150GB |
H100 (80GB) * 3 |
300B |
Freeze (FP16) |
800GB |
A100 / H100 (80GB) * 10 |
|
LoRA (FP16) |
600GB |
A100 / H100 (80GB) * 8 |
|
QLoRA (INT8) |
400GB |
A100 / H100 (80GB) * 6 |
|
QLoRA (INT4) |
250GB |
A100 / H100 (80GB) * 5 |
671B |
Freeze (FP16) |
1.5TB |
H100 (80GB) * 20 |
|
LoRA (FP16) |
1.2TB |
H100 (80GB) * 16 |
|
QLoRA (INT8) |
800GB |
H100 (80GB) * 12 |
|
QLoRA (INT4) |
500GB |
H100 (80GB) * 8 |
基础运行所需要的显存,假设一个请求
Model |
4k Tokens |
8k Tokens |
32k Tokens |
128k Tokens |
7B |
17.6 GB |
19.8 GB |
33.0 GB |
85.8 GB |
13B |
32.12 GB |
35.64 GB |
56.76 GB |
141.24 GB |
30B |
72.05 GB |
78.14 GB |
114.47 GB |
259.74 GB |
66B |
155.58 GB |
165.98 GB |
228.23 GB |
478 GB |
70B |
165.55 GB |
177.07 GB |
244.11 GB |
523.25 GB |
175B |
405.77 GB |
426.53 GB |
551.03 GB |
1049.58 GB |
并发10个请求,tokens数和所需显存
Model |
4k Tokens |
8k Tokens |
32k Tokens |
128k Tokens |
7B |
37.4 GB |
59.4 GB |
191.4 GB |
719.4 GB |
13B |
63.8 GB |
99.0 GB |
303.6 GB |
1,128.6 GB |
30B |
126.5 GB |
181.5 GB |
528.0 GB |
1,914.0 GB |
66B |
244.2 GB |
343.2 GB |
937.2 GB |
3,313.2 GB |
70B |
264.0 GB |
374.0 GB |
1,034.0 GB |
3,674.0 GB |
175B |
583.0 GB |
781.0 GB |
1,969.0 GB |
6,721.0 GB |
参考文章:
nvidia官方测试 https://developer.nvidia.com/deep-learning-performance-training-inference/ai-inference