py-feat detector benchmark — 2026-05-03 22:19:40#

Run metadata#

  • Date: 2026-05-03 22:19:40

  • py-feat version: 0.7.0

  • Git commit: 864962c

  • Host: liquidswords2 (x86_64, 128 CPUs)

  • Python: 3.12.13

  • PyTorch: 2.5.1+cu124

  • GPU: CUDA 12.4, NVIDIA GeForce RTX 3090

  • OMP_NUM_THREADS: 1

  • Devices swept: [‘cpu’, ‘cuda’]

  • Batch sizes: [1, 4, 16]

  • DataLoader workers: [0]

Each timed call is preceded by one untimed warmup; the timed-call wall time is reported.

Video: short (72 frames)#

img2pose#

device

batch

sec

ms/frame

fps

cpu

1

57.62

800.2

1.2

cpu

4

43.01

597.4

1.7

cpu

16

45.27

628.8

1.6

cuda

1

7.77

108.0

9.3

cuda

4

4.36

60.5

16.5

cuda

16

3.45

47.9

20.9

retinaface#

device

batch

sec

ms/frame

fps

cpu

1

22.58

313.6

3.2

cpu

4

8.27

114.8

8.7

cpu

16

7.05

97.9

10.2

cuda

1

5.85

81.2

12.3

cuda

4

1.50

20.8

48.1

cuda

16

0.76

10.6

94.5

MPDetector retinaface#

device

batch

sec

ms/frame

fps

cpu

1

10.04

139.4

7.2

cpu

4

4.74

65.9

15.2

cpu

16

2.60

36.0

27.7

cuda

1

2.81

39.0

25.6

cuda

4

1.71

23.7

42.2

cuda

16

0.83

11.5

86.7

Images: 16 x multi_face.jpg = 80 faces#

img2pose#

device

batch

sec

ms/img

rows

cpu

1

14.76

922.4

80

cpu

4

11.35

709.6

80

cpu

16

13.39

836.8

80

cuda

1

2.51

156.7

80

cuda

4

1.39

87.1

80

cuda

16

1.72

107.6

80

retinaface#

device

batch

sec

ms/img

rows

cpu

1

8.90

556.5

80

cpu

4

6.11

381.7

80

cpu

16

9.63

602.0

80

cuda

1

1.92

120.1

80

cuda

4

0.71

44.4

80

cuda

16

0.91

56.9

80

MPDetector retinaface#

device

batch

sec

ms/img

rows

cpu

1

4.43

276.9

80

cpu

4

3.14

196.5

80

cpu

16

4.93

307.9

80

cuda

1

0.57

35.9

80

cuda

4

0.54

33.9

80

cuda

16

1.74

109.0

80