py-feat detector benchmark — 2026-05-03 22:19:40#
Run metadata#
Date: 2026-05-03 22:19:40
py-feat version: 0.7.0
Git commit: 864962c
Host: liquidswords2 (x86_64, 128 CPUs)
Python: 3.12.13
PyTorch: 2.5.1+cu124
GPU: CUDA 12.4, NVIDIA GeForce RTX 3090
OMP_NUM_THREADS:
1Devices swept: [‘cpu’, ‘cuda’]
Batch sizes: [1, 4, 16]
DataLoader workers: [0]
Each timed call is preceded by one untimed warmup; the timed-call wall time is reported.
Video: short (72 frames)#
img2pose#
device |
batch |
sec |
ms/frame |
fps |
|---|---|---|---|---|
cpu |
1 |
57.62 |
800.2 |
1.2 |
cpu |
4 |
43.01 |
597.4 |
1.7 |
cpu |
16 |
45.27 |
628.8 |
1.6 |
cuda |
1 |
7.77 |
108.0 |
9.3 |
cuda |
4 |
4.36 |
60.5 |
16.5 |
cuda |
16 |
3.45 |
47.9 |
20.9 |
retinaface#
device |
batch |
sec |
ms/frame |
fps |
|---|---|---|---|---|
cpu |
1 |
22.58 |
313.6 |
3.2 |
cpu |
4 |
8.27 |
114.8 |
8.7 |
cpu |
16 |
7.05 |
97.9 |
10.2 |
cuda |
1 |
5.85 |
81.2 |
12.3 |
cuda |
4 |
1.50 |
20.8 |
48.1 |
cuda |
16 |
0.76 |
10.6 |
94.5 |
MPDetector retinaface#
device |
batch |
sec |
ms/frame |
fps |
|---|---|---|---|---|
cpu |
1 |
10.04 |
139.4 |
7.2 |
cpu |
4 |
4.74 |
65.9 |
15.2 |
cpu |
16 |
2.60 |
36.0 |
27.7 |
cuda |
1 |
2.81 |
39.0 |
25.6 |
cuda |
4 |
1.71 |
23.7 |
42.2 |
cuda |
16 |
0.83 |
11.5 |
86.7 |
Images: 16 x multi_face.jpg = 80 faces#
img2pose#
device |
batch |
sec |
ms/img |
rows |
|---|---|---|---|---|
cpu |
1 |
14.76 |
922.4 |
80 |
cpu |
4 |
11.35 |
709.6 |
80 |
cpu |
16 |
13.39 |
836.8 |
80 |
cuda |
1 |
2.51 |
156.7 |
80 |
cuda |
4 |
1.39 |
87.1 |
80 |
cuda |
16 |
1.72 |
107.6 |
80 |
retinaface#
device |
batch |
sec |
ms/img |
rows |
|---|---|---|---|---|
cpu |
1 |
8.90 |
556.5 |
80 |
cpu |
4 |
6.11 |
381.7 |
80 |
cpu |
16 |
9.63 |
602.0 |
80 |
cuda |
1 |
1.92 |
120.1 |
80 |
cuda |
4 |
0.71 |
44.4 |
80 |
cuda |
16 |
0.91 |
56.9 |
80 |
MPDetector retinaface#
device |
batch |
sec |
ms/img |
rows |
|---|---|---|---|---|
cpu |
1 |
4.43 |
276.9 |
80 |
cpu |
4 |
3.14 |
196.5 |
80 |
cpu |
16 |
4.93 |
307.9 |
80 |
cuda |
1 |
0.57 |
35.9 |
80 |
cuda |
4 |
0.54 |
33.9 |
80 |
cuda |
16 |
1.74 |
109.0 |
80 |