# py-feat detector benchmark — 2026-05-03 21:36:43

## Run metadata

- **Date:** 2026-05-03 21:36:43
- **py-feat version:** 0.7.0
- **Git commit:** 09980f9
- **Host:** liquidswords2 (x86_64, 128 CPUs)
- **Python:** 3.12.13
- **PyTorch:** 2.5.1+cu124
- **GPU:** CUDA 12.4, NVIDIA GeForce RTX 3090
- **OMP_NUM_THREADS:** `1`
- **Devices swept:** ['cpu', 'cuda']
- **Batch sizes:** [1, 4, 16]
- **DataLoader workers:** [0]

Each timed call is preceded by one untimed warmup; the timed-call wall time is reported.

## Video: short (72 frames)

### img2pose

| device | batch | sec | ms/frame | fps |
|---|---|---|---|---|
| cpu | 1 | 56.52 | 785.1 | 1.3 |
| cpu | 4 | 43.01 | 597.3 | 1.7 |
| cpu | 16 | 41.55 | 577.1 | 1.7 |
| cuda | 1 | 7.06 | 98.1 | 10.2 |
| cuda | 4 | 4.56 | 63.3 | 15.8 |
| cuda | 16 | 4.05 | 56.3 | 17.8 |


### retinaface

| device | batch | sec | ms/frame | fps |
|---|---|---|---|---|
| cpu | 1 | 22.74 | 315.8 | 3.2 |
| cpu | 4 | 7.43 | 103.2 | 9.7 |
| cpu | 16 | 6.22 | 86.3 | 11.6 |
| cuda | 1 | 4.47 | 62.1 | 16.1 |
| cuda | 4 | 2.59 | 36.0 | 27.8 |
| cuda | 16 | 1.21 | 16.7 | 59.7 |


### MPDetector retinaface

| device | batch | sec | ms/frame | fps |
|---|---|---|---|---|
| cpu | 1 | 9.41 | 130.7 | 7.7 |
| cpu | 4 | 4.94 | 68.6 | 14.6 |
| cpu | 16 | 2.39 | 33.2 | 30.2 |
| cuda | 1 | 3.05 | 42.4 | 23.6 |
| cuda | 4 | 0.82 | 11.5 | 87.3 |
| cuda | 16 | 0.77 | 10.7 | 93.4 |


## Images: 16 x multi_face.jpg = 80 faces

### img2pose

| device | batch | sec | ms/img | rows |
|---|---|---|---|---|
| cpu | 1 | 14.58 | 911.2 | 80 |
| cpu | 4 | 11.50 | 719.0 | 80 |
| cpu | 16 | 13.27 | 829.1 | 80 |
| cuda | 1 | 2.90 | 181.0 | 80 |
| cuda | 4 | 1.88 | 117.2 | 80 |
| cuda | 16 | 1.74 | 108.5 | 80 |


### retinaface

| device | batch | sec | ms/img | rows |
|---|---|---|---|---|
| cpu | 1 | 8.71 | 544.4 | 80 |
| cpu | 4 | 6.16 | 384.8 | 80 |
| cpu | 16 | 8.27 | 516.9 | 80 |
| cuda | 1 | 2.18 | 136.3 | 80 |
| cuda | 4 | 1.29 | 80.9 | 80 |
| cuda | 16 | 0.98 | 61.6 | 80 |


### MPDetector retinaface

| device | batch | sec | ms/img | rows |
|---|---|---|---|---|
| cpu | 1 | 3.76 | 234.7 | 80 |
| cpu | 4 | 3.00 | 187.2 | 80 |
| cpu | 16 | 4.92 | 307.5 | 80 |
| cuda | 1 | 0.94 | 58.8 | 80 |
| cuda | 4 | 0.53 | 33.1 | 80 |
| cuda | 16 | 1.60 | 100.1 | 80 |

