Skip to content

blendshape_to_au

feat.utils.blendshape_to_au

Map MediaPipe / ARKit blendshapes to FACS Action Unit intensities.

Provides a learned PLS regression (pls_predict_batch) trained on paired Detectorv1 (xgb AUs) + MPDetector (52 blendshapes) outputs from ~10K CelebV-HQ celebrity videos (~350K frames). Cheong / Py-Feat-tutorial-06 style: linear features only, no pairwise interactions, no clipping at training. 3-fold GroupKFold (by video_id) OOS variance-weighted R² = 0.236 ± 0.008.

Weights are downloaded from HuggingFace Hub at first use

https://huggingface.co/py-feat/bs_to_au → bs_to_au_pls_v2.npz

Also defines a dlib-68 → MediaPipe-478 landmark index mapping so the existing dlib-based AU muscle-polygon heatmap drawing can be reused on MPDetector output.

mp478_row_to_dlib68_view(row)

Build a dict with x_0..x_67 / y_0..y_67 keys by sampling the matching MediaPipe-478 landmarks.

Source code in feat/utils/blendshape_to_au.py
def mp478_row_to_dlib68_view(row) -> dict:
    """Build a dict with x_0..x_67 / y_0..y_67 keys by sampling
    the matching MediaPipe-478 landmarks."""
    view: dict = {}
    for dlib_idx, mp_idx in enumerate(DLIB68_FROM_MP478):
        view[f"x_{dlib_idx}"] = row.get(f"x_{mp_idx}", np.nan)
        view[f"y_{dlib_idx}"] = row.get(f"y_{mp_idx}", np.nan)
    for k in (
        "FaceRectX", "FaceRectY", "FaceRectWidth", "FaceRectHeight",
        "Pitch", "Roll", "Yaw",
    ):
        if k in row.index if hasattr(row, "index") else k in row:
            view[k] = row[k]
    return view

pls_predict_batch(blendshape_array, clip=True)

(N, 52) MP blendshapes → (N, 20) AU intensities via Cheong-style PLS.

Trained on ~350K frames from 10K CelebV-HQ wild-celebrity videos, paired (xgb AU intensity, MP blendshape coefficient) per frame, pose-filtered to |yaw| ≤ 40° and |pitch| ≤ 30°. PLS-2 with n_components=20 (full rank), linear features only — pairwise BS interactions were tested and degraded out-of-sample R², so they are NOT used.

3-fold GroupKFold (by video_id) variance-weighted R² = 0.236 ± 0.008. Per-AU R² is strongest on AU06/12/43 (~0.50), weakest on AU11/15/28 (<0.10).

See https://huggingface.co/py-feat/bs_to_au for the model card.

Parameters:

Name Type Description Default
blendshape_array ndarray

blendshape coefficients in MediaPipe FaceLandmarker order (matches MPDetector output). Either a 1-D (52,) vector for a single face or a 2-D (N, 52) batch. See _PLS_WEIGHTS["blendshape_columns"] after load for the exact column names.

required
clip bool

if True (default), output is clipped to [0, 1] for display consistency with FACS intensity convention.

True

Returns:

Type Description
ndarray

AU intensities in py-feat's standard order. Shape matches input batching:

ndarray

(20,) for a 1-D input, (N, 20) for a 2-D input.

Source code in feat/utils/blendshape_to_au.py
def pls_predict_batch(
    blendshape_array: np.ndarray, clip: bool = True,
) -> np.ndarray:
    """(N, 52) MP blendshapes → (N, 20) AU intensities via Cheong-style PLS.

    Trained on ~350K frames from 10K CelebV-HQ wild-celebrity videos, paired
    (xgb AU intensity, MP blendshape coefficient) per frame, pose-filtered to
    |yaw| ≤ 40° and |pitch| ≤ 30°. PLS-2 with n_components=20 (full rank),
    linear features only — pairwise BS interactions were tested and degraded
    out-of-sample R², so they are NOT used.

    3-fold GroupKFold (by video_id) variance-weighted R² = 0.236 ± 0.008.
    Per-AU R² is strongest on AU06/12/43 (~0.50), weakest on AU11/15/28 (<0.10).

    See `https://huggingface.co/py-feat/bs_to_au` for the model card.

    Args:
        blendshape_array: blendshape coefficients in MediaPipe FaceLandmarker
            order (matches MPDetector output). Either a 1-D ``(52,)`` vector
            for a single face or a 2-D ``(N, 52)`` batch. See
            ``_PLS_WEIGHTS["blendshape_columns"]`` after load for the exact
            column names.
        clip: if True (default), output is clipped to [0, 1] for display
            consistency with FACS intensity convention.

    Returns:
        AU intensities in py-feat's standard order. Shape matches input batching:
        ``(20,)`` for a 1-D input, ``(N, 20)`` for a 2-D input.
    """
    w = _load_pls_weights()
    bs = np.asarray(blendshape_array, dtype=np.float32)
    n_features = w["coef"].shape[0]
    if bs.ndim == 1:
        if bs.shape[0] != n_features:
            raise ValueError(
                f"Expected 1-D input of length {n_features}, got {bs.shape[0]}."
            )
        bs = bs.reshape(1, -1)
        squeeze_out = True
    elif bs.ndim == 2:
        if bs.shape[1] != n_features:
            raise ValueError(
                f"Expected 2-D input with {n_features} columns, got shape {bs.shape}."
            )
        squeeze_out = False
    else:
        raise ValueError(
            f"blendshape_array must be 1-D ({n_features},) or 2-D (N, {n_features}); "
            f"got ndim={bs.ndim}, shape={bs.shape}."
        )
    out = bs @ w["coef"] + w["intercept"]
    if clip:
        out = np.clip(out, 0.0, 1.0)
    return out[0] if squeeze_out else out