py-feat detector benchmark — 2026-05-03 21:36:43#

Run metadata#

  • Date: 2026-05-03 21:36:43

  • py-feat version: 0.7.0

  • Git commit: 09980f9

  • Host: liquidswords2 (x86_64, 128 CPUs)

  • Python: 3.12.13

  • PyTorch: 2.5.1+cu124

  • GPU: CUDA 12.4, NVIDIA GeForce RTX 3090

  • OMP_NUM_THREADS: 1

  • Devices swept: [‘cpu’, ‘cuda’]

  • Batch sizes: [1, 4, 16]

  • DataLoader workers: [0]

Each timed call is preceded by one untimed warmup; the timed-call wall time is reported.

Video: short (72 frames)#

img2pose#

device

batch

sec

ms/frame

fps

cpu

1

56.52

785.1

1.3

cpu

4

43.01

597.3

1.7

cpu

16

41.55

577.1

1.7

cuda

1

7.06

98.1

10.2

cuda

4

4.56

63.3

15.8

cuda

16

4.05

56.3

17.8

retinaface#

device

batch

sec

ms/frame

fps

cpu

1

22.74

315.8

3.2

cpu

4

7.43

103.2

9.7

cpu

16

6.22

86.3

11.6

cuda

1

4.47

62.1

16.1

cuda

4

2.59

36.0

27.8

cuda

16

1.21

16.7

59.7

MPDetector retinaface#

device

batch

sec

ms/frame

fps

cpu

1

9.41

130.7

7.7

cpu

4

4.94

68.6

14.6

cpu

16

2.39

33.2

30.2

cuda

1

3.05

42.4

23.6

cuda

4

0.82

11.5

87.3

cuda

16

0.77

10.7

93.4

Images: 16 x multi_face.jpg = 80 faces#

img2pose#

device

batch

sec

ms/img

rows

cpu

1

14.58

911.2

80

cpu

4

11.50

719.0

80

cpu

16

13.27

829.1

80

cuda

1

2.90

181.0

80

cuda

4

1.88

117.2

80

cuda

16

1.74

108.5

80

retinaface#

device

batch

sec

ms/img

rows

cpu

1

8.71

544.4

80

cpu

4

6.16

384.8

80

cpu

16

8.27

516.9

80

cuda

1

2.18

136.3

80

cuda

4

1.29

80.9

80

cuda

16

0.98

61.6

80

MPDetector retinaface#

device

batch

sec

ms/img

rows

cpu

1

3.76

234.7

80

cpu

4

3.00

187.2

80

cpu

16

4.92

307.5

80

cuda

1

0.94

58.8

80

cuda

4

0.53

33.1

80

cuda

16

1.60

100.1

80