py-feat detector benchmark — 2026-05-03 21:36:43#
Run metadata#
Date: 2026-05-03 21:36:43
py-feat version: 0.7.0
Git commit: 09980f9
Host: liquidswords2 (x86_64, 128 CPUs)
Python: 3.12.13
PyTorch: 2.5.1+cu124
GPU: CUDA 12.4, NVIDIA GeForce RTX 3090
OMP_NUM_THREADS:
1Devices swept: [‘cpu’, ‘cuda’]
Batch sizes: [1, 4, 16]
DataLoader workers: [0]
Each timed call is preceded by one untimed warmup; the timed-call wall time is reported.
Video: short (72 frames)#
img2pose#
device |
batch |
sec |
ms/frame |
fps |
|---|---|---|---|---|
cpu |
1 |
56.52 |
785.1 |
1.3 |
cpu |
4 |
43.01 |
597.3 |
1.7 |
cpu |
16 |
41.55 |
577.1 |
1.7 |
cuda |
1 |
7.06 |
98.1 |
10.2 |
cuda |
4 |
4.56 |
63.3 |
15.8 |
cuda |
16 |
4.05 |
56.3 |
17.8 |
retinaface#
device |
batch |
sec |
ms/frame |
fps |
|---|---|---|---|---|
cpu |
1 |
22.74 |
315.8 |
3.2 |
cpu |
4 |
7.43 |
103.2 |
9.7 |
cpu |
16 |
6.22 |
86.3 |
11.6 |
cuda |
1 |
4.47 |
62.1 |
16.1 |
cuda |
4 |
2.59 |
36.0 |
27.8 |
cuda |
16 |
1.21 |
16.7 |
59.7 |
MPDetector retinaface#
device |
batch |
sec |
ms/frame |
fps |
|---|---|---|---|---|
cpu |
1 |
9.41 |
130.7 |
7.7 |
cpu |
4 |
4.94 |
68.6 |
14.6 |
cpu |
16 |
2.39 |
33.2 |
30.2 |
cuda |
1 |
3.05 |
42.4 |
23.6 |
cuda |
4 |
0.82 |
11.5 |
87.3 |
cuda |
16 |
0.77 |
10.7 |
93.4 |
Images: 16 x multi_face.jpg = 80 faces#
img2pose#
device |
batch |
sec |
ms/img |
rows |
|---|---|---|---|---|
cpu |
1 |
14.58 |
911.2 |
80 |
cpu |
4 |
11.50 |
719.0 |
80 |
cpu |
16 |
13.27 |
829.1 |
80 |
cuda |
1 |
2.90 |
181.0 |
80 |
cuda |
4 |
1.88 |
117.2 |
80 |
cuda |
16 |
1.74 |
108.5 |
80 |
retinaface#
device |
batch |
sec |
ms/img |
rows |
|---|---|---|---|---|
cpu |
1 |
8.71 |
544.4 |
80 |
cpu |
4 |
6.16 |
384.8 |
80 |
cpu |
16 |
8.27 |
516.9 |
80 |
cuda |
1 |
2.18 |
136.3 |
80 |
cuda |
4 |
1.29 |
80.9 |
80 |
cuda |
16 |
0.98 |
61.6 |
80 |
MPDetector retinaface#
device |
batch |
sec |
ms/img |
rows |
|---|---|---|---|---|
cpu |
1 |
3.76 |
234.7 |
80 |
cpu |
4 |
3.00 |
187.2 |
80 |
cpu |
16 |
4.92 |
307.5 |
80 |
cuda |
1 |
0.94 |
58.8 |
80 |
cuda |
4 |
0.53 |
33.1 |
80 |
cuda |
16 |
1.60 |
100.1 |
80 |