# 3. Detecting facial expressions from videos#

Written by Jin Hyun Cheong and Eshin Jolly

In this tutorial we’ll explore how to use the Detector class to process video files. You can try it out interactively in Google Collab:

# Uncomment the line below and run this only if you're using Google Collab
# !pip install -q py-feat


## 3.1 Setting up the Detector#

We’ll begin by creating a new Detector instance just like the previous tutorial

from feat import Detector

detector = Detector(
face_model="retinaface",
landmark_model="mobilefacenet",
au_model="svm",
facepose_model="img2pose",
)

detector

feat.detector.Detector(face_model=retinaface, landmark_model=mobilefacenet, au_model=svm, emotion_model=resmasknet, facepose_model=img2pose)


## 3.2 Processing videos#

Detecting facial expressions in videos is easy to do using the .detect_video() method. This sample video included in Py-Feat is by Wolfgang Langer from Pexels.

from feat.utils import get_test_data_path
import os

test_data_dir = get_test_data_path()
test_video_path = os.path.join(test_data_dir, "WolfgangLanger_Pexels.mp4")

# Show video
from IPython.core.display import Video
Video(test_video_path, embed=False)


Just like .detect_image() we can just pass the path to the video file to .detect_video(). Here we also set skip_frames=24 which tells the detector to process only every 24th frame for the sake of speed:

video_prediction = detector.detect_video(test_video_path, skip_frames=24)

/Users/Esh/anaconda3/envs/py-feat/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

frame FaceRectX FaceRectY FaceRectWidth FaceRectHeight FaceScore x_0 x_1 x_2 x_3 ... Roll Yaw anger disgust fear happiness sadness surprise neutral input
0 0 334.117889 54.037693 215.074738 293.471466 0.999814 331.183881 332.67499 337.93965 345.98739 ... -4.564205 12.390767 0.014724 0.00003 0.165501 0.001346 0.331741 0.063215 0.423442 /Users/Esh/Documents/pypackages/py-feat/feat/t...
24 24 352.668091 57.767002 209.387207 282.59494 0.999875 330.398783 332.1624 337.567593 345.755364 ... -3.138013 14.580695 0.014894 0.000021 0.239515 0.000816 0.340588 0.070715 0.33345 /Users/Esh/Documents/pypackages/py-feat/feat/t...
48 48 344.77536 38.98122 205.286224 288.010895 0.999842 345.17962 345.301341 349.045422 356.160386 ... 2.12017 3.003316 0.005543 0.000007 0.250209 0.000673 0.272148 0.057505 0.413915 /Users/Esh/Documents/pypackages/py-feat/feat/t...
72 72 322.318298 63.507832 205.645325 297.096222 0.999252 343.375832 342.31017 344.141114 346.864314 ... 1.996875 6.923225 0.012626 0.000024 0.197583 0.000953 0.302611 0.078977 0.407226 /Users/Esh/Documents/pypackages/py-feat/feat/t...
96 96 324.173523 88.114365 221.387939 296.47644 0.999382 315.561634 315.32702 318.230413 323.967994 ... -0.717578 6.199114 0.004423 0.000037 0.285199 0.00071 0.247177 0.017002 0.445452 /Users/Esh/Documents/pypackages/py-feat/feat/t...

5 rows × 173 columns

We can see that our 20s long video, recorded at 24 frames-per-second, produces 20 predictions because we set skip_frames=24:

video_prediction.shape

(20, 173)

video_prediction.detector

'Feat'


## 3.3 Visualizing predictions#

You can also plot the detection results from a video. The frames are not extracted from the video (that will result in thousands of images) so the visualization only shows the detected face without the underlying image.

The video has 24 fps and the actress show sadness around the 0:02, and happiness at 0:14 seconds.

# Frame 48 = ~0:02
# Frame 408 = ~0:14
# Frame numbers are the index so we can use .loc

[<Figure size 1080x504 with 3 Axes>, <Figure size 1080x504 with 3 Axes>]


We can also leverage existing pandas plotting functions to show how emotions unfold over time. We can clearly see how her emotions change from sadness to happiness.

axes = video_prediction.emotions.plot()


In situations you want to predict EVERY frame of the video, you can ust leave out the skip_frames argument. Speed of processing may vary depending on the detector you use and the length of the video:

# Run me to get a prediction for EVERY video frame.
# WARNING MAY TAKE A WHILE!
video_prediction = detector.detect_video(test_video_path)