face_pose¶
feat.utils.face_pose
¶
Pure-PyTorch head-pose estimation from MediaPipe Face Mesh predictions.
The MediaPipe Face Mesh outputs ~468 3D landmarks per face, in screen-relative coordinates (x/y in image space, z is relative depth). MediaPipe also publishes a canonical 3D face model in head-centric coordinates with the same vertex ordering. The head pose for a detected face is therefore the rigid similarity transform that aligns the observed mesh to the canonical mesh.
This is solved in closed form via the Umeyama (1991) algorithm using a single
SVD - no iteration, no Adam loop, no requires_grad workaround, and no
camera intrinsics. Works equally well in torch.inference_mode().
estimate_face_pose_from_mesh(observed_landmarks_3d, canonical=None, return_euler_angles=True)
¶
Estimate head pose from face-mesh landmarks via rigid alignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed_landmarks_3d
|
Tensor of shape [B, 468, 3] (or [B, 478, 3], in
which case the last 10 iris landmarks are dropped). Coordinates
must be in the canonical face-model convention (X right, Y
UP, Z OUT of the face toward the camera). Raw MediaPipe Face
Mesh outputs use image-pixel space (Y down, Z into screen) and
must be flipped before being passed in - see
|
required | |
canonical
|
Optional [468, 3] canonical face model. If None, loaded from py-feat resources. |
None
|
|
return_euler_angles
|
If True (default) return Euler angles (pitch, roll, yaw) instead of the full rotation matrix. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
|
If return_euler_angles: (euler, t): euler shape [B, 3]; t shape [B, 3]. |
||
Else |
(R, t): R shape [B, 3, 3]; t shape [B, 3]. |
Source code in feat/utils/face_pose.py
load_canonical_face_model(device=None)
¶
Load MediaPipe's 468-vertex canonical face model.
Vertex coordinates are in head-centric space, with Y up, X to the subject's right, and Z out of the face. Same vertex ordering as MediaPipe Face Mesh's first 468 output landmarks (the 10 iris landmarks at indices 468-477 are not part of the canonical model and must be excluded before alignment).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
Optional torch.device or string. Defaults to CPU. |
None
|
Returns:
| Type | Description |
|---|---|
|
torch.Tensor of shape [468, 3]. |
Source code in feat/utils/face_pose.py
rotation_matrix_to_euler_angles(R)
¶
Convert rotation matrix to (pitch, roll, yaw) Euler angles in radians.
Uses the convention where pitch rotates around X, roll around Z, yaw around Y, applied in that intrinsic order. Handles the gimbal-lock singularity (when the X-Z column of R is near zero).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
R
|
Tensor of shape [..., 3, 3]. |
required |
Returns:
| Type | Description |
|---|---|
|
Tensor of shape [..., 3] with columns (pitch, roll, yaw) in radians. |
Source code in feat/utils/face_pose.py
umeyama_alignment(src, dst, with_scale=True)
¶
Closed-form similarity transform from src points to dst points.
Solves the Umeyama (1991) least-squares problem: find rotation R, translation t, and (optional) scale s such that
dst ≈ s * R @ src + t
in the least-squares sense. This is equivalent to OpenCV's
estimateAffine3D for rigid+scale transforms, but pure-torch and batched.
Works inside torch.inference_mode() (no autograd needed) and supports a
leading batch dimension so a whole batch of faces aligns in one call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
[..., N, 3] source points (e.g., canonical face model). |
required | |
dst
|
[..., N, 3] target points (e.g., observed landmarks). |
required | |
with_scale
|
If True, recover an isotropic scale factor; otherwise s=1. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
R |
[..., 3, 3] rotation matrix (det = +1, no reflection). |
|
t |
[..., 3] translation vector. |
|
scale |
[...] non-negative scale factor (always returned; is 1.0 when with_scale=False; clamped to >= 0 when with_scale=True). For degenerate inputs (e.g. coincident src points), scale falls back to 0 and the recovered transform is meaningless; callers that need to detect this case can check for very small src variance themselves. |