py-feat detector benchmark — 2026-05-03 23:09:29

py-feat detector benchmark — 2026-05-03 23:09:29#

Run metadata#

Date: 2026-05-03 23:09:29
py-feat version: 0.7.0
Git commit: d71c0d7
Host: vpn-two-factor-general-228-129-185.dartmouth.edu (arm64, 18 CPUs)
Python: 3.13.12
PyTorch: 2.11.0
GPU: MPS available
OMP_NUM_THREADS: 1
Devices swept: [‘mps’]
Batch sizes: [1, 4, 16]
DataLoader workers: [0]

Each timed call is preceded by one untimed warmup; the timed-call wall time is reported.

Video: short (72 frames)#

img2pose#

device	batch	sec	ms/frame	fps
mps	1	8.17	113.5	8.8
mps	4	5.79	80.5	12.4
mps	16	5.21	72.3	13.8

retinaface#

device	batch	sec	ms/frame	fps
mps	1	3.88	54.0	18.5
mps	4	1.58	21.9	45.6
mps	16	0.98	13.7	73.2

MPDetector retinaface#

device	batch	sec	ms/frame	fps
mps	1	10.21	141.8	7.1
mps	4	3.24	45.0	22.2
mps	16	1.63	22.6	44.2

Video: long (472 frames)#

img2pose#

device	batch	sec	ms/frame	fps
mps	1	82.45	174.7	5.7
mps	4	57.08	120.9	8.3
mps	16	53.62	113.6	8.8

retinaface#

device	batch	sec	ms/frame	fps
mps	1	29.33	62.1	16.1
mps	4	12.10	25.6	39.0
mps	16	8.86	18.8	53.3

MPDetector retinaface#

device	batch	sec	ms/frame	fps
mps	1	77.89	165.0	6.1
mps	4	23.16	49.1	20.4
mps	16	11.08	23.5	42.6

Images: 16 x multi_face.jpg = 80 faces#

img2pose#

device	batch	sec	ms/img	rows
mps	1	3.71	232.2	80
mps	4	2.28	142.6	80
mps	16	2.69	168.2	80

retinaface#

device	batch	sec	ms/img	rows
mps	1	1.43	89.5	80
mps	4	0.89	55.8	80
mps	16	0.90	56.4	80

MPDetector retinaface#

device	batch	sec	ms/img	rows
mps	1	3.97	248.3	80
mps	4	1.74	108.8	80
mps	16	1.22	76.5	80

Notes (hand-curated)#

Scope: full-pipeline bench on M5 MBP MPS. All three configs run with au_model='xgb' (or mp_blendshapes for MPDetector), emotion_model='resmasknet', and identity_model='arcface' — matching the default Detector() config most users run in production. Replaces the partial svm-AU bench at 2026-05-03-864962c-mps.md.

Headline numbers (best ms/frame, MPS):

Config	Long video (472 frames)	Image batch (80 faces)
`Detector(face_model='img2pose', au_model='xgb')`	113.6 ms/frame (8.8 fps), batch=16	142.6 ms/img, batch=4
`Detector(face_model='retinaface', au_model='xgb')`	18.8 ms/frame (53.3 fps), batch=16	55.8 ms/img, batch=4
`MPDetector(au_model='mp_blendshapes')`	23.5 ms/frame (42.6 fps), batch=16	76.5 ms/img, batch=16

All three include emotion (resmasknet) + identity (arcface) on top of face + landmark + AU.

Recommendation for users: Detector(face_model='retinaface') is the fastest full-pipeline config on MPS at batch=16 — 53 fps on 472-frame video. Img2pose pays for its head-pose regression in compute (~6× slower at batch=16). MPDetector falls in between and gives the 478-point mediapipe mesh at the cost of slightly slower wall time.

HOG batching speedup (PR #292, isolated bench):

The detector-level numbers above include face detection + landmark + emotion + identity, which dilutes the HOG speedup. Isolated extract_hog_features_batched vs the legacy extract_hog_features per-face loop on MPS:

n_faces	Legacy	Batched (PR #292)	Speedup
5	15.2 ms	7.8 ms	1.96x
20	68.1 ms	16.3 ms	4.19x
50	145.9 ms	32.0 ms	4.56x

Comparison to prior baselines:

2026-05-03-864962c.md (Linux + RTX 3090 + CUDA, hand-curated note) was run with au_model='svm' — the bench script’s pre-PR-NNN configs. Direct cell comparison drifts ~5-10% from the present configs (svm vs xgb), so cross-platform speed gaps are illustrative not exact. Re-running that bench with the new full-pipeline configs would close the comparison gap.
2026-05-03-437b651.md is hand-curated from a partial v0.7 svm sweep; the workers-axis cells are still informative for num_workers > 0 regression tracking.
Pre-v0.7 community benchmarks live in py-feat issue #184 (Google Sheet, per-stage timings); not directly cell-comparable to detect()-wall-time format here.

py-feat detector benchmark — 2026-05-03 23:09:29

Contents

py-feat detector benchmark — 2026-05-03 23:09:29#

Run metadata#

Video: short (72 frames)#

img2pose#

retinaface#

MPDetector retinaface#

Video: long (472 frames)#

img2pose#

retinaface#

MPDetector retinaface#

Images: 16 x multi_face.jpg = 80 faces#

img2pose#

retinaface#

MPDetector retinaface#

Notes (hand-curated)#