# py-feat detector benchmark — 2026-05-03 22:19:40

## Run metadata

- **Date:** 2026-05-03 22:19:40
- **py-feat version:** 0.7.0
- **Git commit:** 864962c
- **Host:** liquidswords2 (x86_64, 128 CPUs)
- **Python:** 3.12.13
- **PyTorch:** 2.5.1+cu124
- **GPU:** CUDA 12.4, NVIDIA GeForce RTX 3090
- **OMP_NUM_THREADS:** `1`
- **Devices swept:** ['cpu', 'cuda']
- **Batch sizes:** [1, 4, 16]
- **DataLoader workers:** [0]

Each timed call is preceded by one untimed warmup; the timed-call wall time is reported.

## Video: short (72 frames)

### img2pose

| device | batch | sec | ms/frame | fps |
|---|---|---|---|---|
| cpu | 1 | 57.62 | 800.2 | 1.2 |
| cpu | 4 | 43.01 | 597.4 | 1.7 |
| cpu | 16 | 45.27 | 628.8 | 1.6 |
| cuda | 1 | 7.77 | 108.0 | 9.3 |
| cuda | 4 | 4.36 | 60.5 | 16.5 |
| cuda | 16 | 3.45 | 47.9 | 20.9 |


### retinaface

| device | batch | sec | ms/frame | fps |
|---|---|---|---|---|
| cpu | 1 | 22.58 | 313.6 | 3.2 |
| cpu | 4 | 8.27 | 114.8 | 8.7 |
| cpu | 16 | 7.05 | 97.9 | 10.2 |
| cuda | 1 | 5.85 | 81.2 | 12.3 |
| cuda | 4 | 1.50 | 20.8 | 48.1 |
| cuda | 16 | 0.76 | 10.6 | 94.5 |


### MPDetector retinaface

| device | batch | sec | ms/frame | fps |
|---|---|---|---|---|
| cpu | 1 | 10.04 | 139.4 | 7.2 |
| cpu | 4 | 4.74 | 65.9 | 15.2 |
| cpu | 16 | 2.60 | 36.0 | 27.7 |
| cuda | 1 | 2.81 | 39.0 | 25.6 |
| cuda | 4 | 1.71 | 23.7 | 42.2 |
| cuda | 16 | 0.83 | 11.5 | 86.7 |


## Images: 16 x multi_face.jpg = 80 faces

### img2pose

| device | batch | sec | ms/img | rows |
|---|---|---|---|---|
| cpu | 1 | 14.76 | 922.4 | 80 |
| cpu | 4 | 11.35 | 709.6 | 80 |
| cpu | 16 | 13.39 | 836.8 | 80 |
| cuda | 1 | 2.51 | 156.7 | 80 |
| cuda | 4 | 1.39 | 87.1 | 80 |
| cuda | 16 | 1.72 | 107.6 | 80 |


### retinaface

| device | batch | sec | ms/img | rows |
|---|---|---|---|---|
| cpu | 1 | 8.90 | 556.5 | 80 |
| cpu | 4 | 6.11 | 381.7 | 80 |
| cpu | 16 | 9.63 | 602.0 | 80 |
| cuda | 1 | 1.92 | 120.1 | 80 |
| cuda | 4 | 0.71 | 44.4 | 80 |
| cuda | 16 | 0.91 | 56.9 | 80 |


### MPDetector retinaface

| device | batch | sec | ms/img | rows |
|---|---|---|---|---|
| cpu | 1 | 4.43 | 276.9 | 80 |
| cpu | 4 | 3.14 | 196.5 | 80 |
| cpu | 16 | 4.93 | 307.9 | 80 |
| cuda | 1 | 0.57 | 35.9 | 80 |
| cuda | 4 | 0.54 | 33.9 | 80 |
| cuda | 16 | 1.74 | 109.0 | 80 |

