Accuracy benchmarks¶

Tracks per-release accuracy of py-feat detectors against held-out labeled datasets. Per-dataset accuracy is produced by the AU/dataset bench scripts under scripts/ (e.g. scripts/bench_xgb_local.py and scripts/bench_xgb_feature_mode.py for DISFA / DISFA+). The previous unified bench_regression.py --markdown harness has been retired. Speed (throughput) benchmarks live in Speed.md.

Latest¶

See 2026-05-15-67cd7d9-accuracy.md.

Methodology¶

DISFA P3 fold, ArcFace-aligned crops, AU intensity binarized at >=2 for F1; ICC(3,1) on continuous intensity vs. py-feat probability.
AffectNet validation set, classes 0..6 mapped to the 7 py-feat emotion columns; top-1 emotion accuracy and macro F1.
CALFW / CPLFW 6000 pairs, LFW 10-fold CV protocol, InsightFace 5-landmark template alignment before ArcFace embedding.
TinyFace closed-set + open-set rank-K identification with the Gallery_Distractor set (153k images) when not disabled.

History¶

date	run
2026-05-15-67cd7d9	2026-05-15-67cd7d9-accuracy.md