Skip to content

4. Running a full analysis

In this tutorial we'll perform a real analysis on part of the open dataset from "A Data-Driven Characterisation Of Natural Facial Expressions When Giving Good And Bad News" by Watson & Johnston 2020.

In the original paper the authors had 3 speakers deliver good or bad news while filming their facial expressions. They found that could accurately "decode" each condition based on participants' facial expressions extracted either using a custom multi-chanel-gradient model or action units (AUs) extracted using Open Face.

In this tutorial we'll show how easily it is to not only reproduce their decoding analysis with py-feat, but just as easily perform additional analyses. Specifically we'll:

  1. Download 20 of the first subject's videos (the full dataset is available on OSF
  2. Extract facial features using the Detectorv2
  3. Aggregate and summarize detections per video using Fex
  4. Train and test a decoder to classify good vs bad news using extracted emotions, AUs, and poses
  5. Run a fMRI style "mass-univariate" comparison across all AUs between conditions
  6. Run a time-series analysis comparing videos based on the time-courses of extracted facial features

4.1 The data

We'll analyze 20 short clips from the Watson & Johnston (2020) "good news vs bad news" dataset — 10 of each. To keep this tutorial fast and offline, the per-frame Detectorv2 outputs are pre-computed and bundled with the docs under docs/data/news_sample/, so we load them directly instead of downloading videos and re-running detection. The full dataset is on OSF.

from glob import glob
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sns.set_context("talk")

# Pre-computed per-frame Detectorv2 outputs (20 AUs + 7 emotions + valence/
# arousal + head pose + gaze + the 478-vertex mesh landmarks + 52 blendshapes)
# for the 20 bundled clips, generated with skip_frames=2 and committed under
# docs/data/news_sample/ so this tutorial runs offline.
data_dir = Path(__file__).resolve().parent.parent / "data" / "news_sample"
videos = [Path(c).stem + ".mp4" for c in sorted(glob(str(data_dir / "0*.csv")))]

clip_attrs = pd.read_csv(data_dir / "clip_attrs.csv").assign(
    input=lambda d: d.clipN.apply(lambda x: str(x).zfill(3) + ".mp4"),
    condition=lambda d: d["class"].replace({"gn": "goodNews", "ists": "badNews"}),
)
clip_attrs = clip_attrs.query("input in @videos")

print(
    f"{len(videos)} clips: "
    f"{(clip_attrs.condition == 'goodNews').sum()} good news, "
    f"{(clip_attrs.condition == 'badNews').sum()} bad news"
)
20 clips: 10 good news, 10 bad news

4.2 How the detections were generated

The bundled CSVs were produced by running the Detectorv2 over each clip and saving its per-frame output. You don't need to run this — we load the saved CSVs in the next section — but here's the code that made them:

from feat import Detectorv2

detector = Detectorv2(device="cuda")  # or "mps" / "cpu"
for video in videos:
    fex = detector.detect(video, data_type="video", batch_size=8, skip_frames=2)
    fex.to_csv(video.replace(".mp4", ".csv"), index=False)

read_feat recognizes these as Detectorv2 output (from their valence/arousal and 478-vertex mesh columns) and restores the matching .aus / .emotions / .poses / .gazes accessors.

4.3. Aggregate detections using a Fex dataframe

Then we can use read_feat to load each CSV file and concatenate them together:

from feat.utils.io import read_feat
fex_1 = pd.concat(
    read_feat(str(data_dir / video.replace(".mp4", ".csv"))) for video in videos
)
print(f'Unique videos: {fex_1.inputs.nunique()}')
print(f'Total processed frames: {fex_1.shape[0]}')
print(f"Avg frames per video: {fex_1.groupby('input').size().mean()}")
Unique videos: 20
Total processed frames: 477
Avg frames per video: 23.85

Our Fex dataframe now contains all detections for all frames of each video

fex_1.shape
fex_1.head()
FaceRectXFaceRectYFaceRectWidthFaceRectHeightFaceScorex_0x_1x_2x_3x_4...mouthUpperUpLeftmouthUpperUpRightnoseSneerLeftnoseSneerRightFrameHeightFrameWidthinputframeapprox_timeIdentity
0231.36029213.90369662.01550662.015500.999989359.36720363.24620368.41818378.11566393.63165...0.0016480.0032816.482005e-079.424984e-071080.01080.0001.mp4000:00NaN
1235.58954216.35260655.57166655.571660.999989361.07007364.27110370.03296378.99585394.36078...0.0010300.0020456.482005e-078.866191e-071080.01080.0001.mp4200:00NaN
2240.72534214.76416652.69850652.698500.999988363.74370366.93073372.02990381.59094396.25116...0.0002960.0006455.699694e-077.823110e-071080.01080.0001.mp4400:00NaN
3237.07724208.92416670.49060670.490600.999989365.41333368.03244373.27063382.43750397.49738...0.0004180.0009125.364418e-079.424984e-071080.01080.0001.mp4600:00NaN
4232.31665211.43927677.80920677.809200.999990365.36320368.67280374.63013383.89703399.78320...0.0018690.0043336.072223e-071.139939e-061080.01080.0001.mp4800:00NaN

5 rows × 2183 columns

Summarize data with Fex.sessions

Fex dataframes have a special attribute called .sessions that act as a grouping factor to make it easier to compute summary statistics with any of the .extract_* methods. By default .sessions is None, but you can use the .update_sessions() to return a new Fex dataframe with .sessions set.

For example, if we update the sessions to be the name of each video, then .extract_mean() will group video-frames (rows) by video making it easy to compute a single summary statistic per file:

by_video = fex_1.update_sessions(fex_1['input'])
video_means = by_video.extract_mean()
# Compute the mean per video
video_means  # 20 rows for 20 videos
mean_FaceRectXmean_FaceRectYmean_FaceRectWidthmean_FaceRectHeightmean_FaceScoremean_x_0mean_x_1mean_x_2mean_x_3mean_x_4...mean_mouthStretchLeftmean_mouthStretchRightmean_mouthUpperUpLeftmean_mouthUpperUpRightmean_noseSneerLeftmean_noseSneerRightmean_FrameHeightmean_FrameWidthmean_framemean_Identity
001.mp4228.067197212.486615666.696606666.6965950.999989357.154249360.562415366.123327375.685371391.008434...0.0004090.0008520.0015420.0042024.327856e-077.288530e-071080.01080.019.0NaN
002.mp4224.596631208.872836667.766566667.7665510.999988353.992130355.834680360.239261368.648613382.836786...0.0011080.0021660.0005380.0010356.315697e-075.405662e-071080.01080.013.0NaN
003.mp4220.553640208.426700661.469404661.4694030.999989346.443812348.356508352.717816360.550875374.062050...0.0004160.0012720.0146070.0237731.014676e-067.956599e-071080.01080.023.0NaN
004.mp4206.602689214.005486662.983170662.9831690.999990333.571429336.693899342.125789351.242905365.876890...0.0007570.0017320.0306480.0561261.045713e-061.636941e-061080.01080.022.0NaN
005.mp4189.311367226.164585669.622540669.6225490.999990318.945233322.496384328.040360337.220817352.571631...0.0010410.0019850.0410660.0706756.294121e-072.383133e-061080.01080.022.0NaN
..................................................................
016.mp4188.534338233.985227641.555135641.5551430.999982304.692316310.089139318.195942330.912421349.278520...0.0004190.0024210.0001480.0001815.916525e-072.145027e-071080.01080.038.0NaN
017.mp4200.779778232.391727647.094319647.0943250.999986325.604824331.531583338.546530349.374781364.318932...0.0008060.0014800.0067110.0143381.050725e-064.987072e-071080.01080.028.0NaN
018.mp4225.480577238.637433643.314181643.3141530.999982344.096223347.055058352.852808363.068252379.061245...0.0002680.0023670.0012080.0016644.587626e-075.929765e-071080.01080.026.0NaN
019.mp4190.195110227.817021649.331265649.3312740.999985308.581546311.671701317.121919326.855916341.824698...0.0006310.0018280.0021420.0042358.648494e-073.698951e-071080.01080.031.0NaN
020.mp4174.576909218.228273646.920015646.9200380.999986294.347939300.894758309.029580321.320305338.416800...0.0006100.0014970.0124560.0211241.044549e-064.109531e-071080.01080.032.0NaN

20 rows × 2181 columns

Then we can grab the AU detections and call standard pandas methods like .loc and .plot:

# Grab the aus just for video 1
video001_aus = video_means.aus.loc['001.mp4']
_ax = video001_aus.plot(kind='bar', title='Video 001 AU detection')
# Plot them
_ax.set(ylabel='Average Probability')
sns.despine()
_ax.figure

Chaining operations

.update_sessions() always returns a copy of the Fex object, so that you can chain operations together including existing pandas methods like .plot(). Here's an example passing a dictionary to .update_sessions(), which maps between old and new session names:

# Which condition each video belonged to
video2condition = dict(zip(clip_attrs['input'], clip_attrs['condition']))
_ax = by_video.update_sessions(video2condition).extract_mean().aus.plot(kind='bar', legend=False, title='Mean AU detection by condition')
_ax.set(ylabel='Average Probability', title='AU detection by condition', xticklabels=['Good News', 'Bad News'])  # if loading pre-computed csv
plt.xticks(rotation=0)  # clip_attrs["input"].str.replace(".mp4", ".csv", regex=False),
# Update sessions to group by condition, compute means (per condition), and make a
# barplot of the mean AUs for each condition
sns.despine()
_ax.figure

We can also focus in on the AUs associated with happiness:

aus = ['AU06', 'AU12', 'AU25']  # from https://py-feat.org/pages/au_reference.html
summary = by_video.update_sessions(video2condition).extract_summary(mean=True, sem=True, std=False, min=False, max=False)
# Update the sessions to condition compute summary stats
bad_means = summary.loc['badNews', [f'mean_{au}' for au in aus]]
bad_sems = summary.loc['badNews', [f'sem_{au}' for au in aus]]
good_means = summary.loc['goodNews', [f'mean_{au}' for au in aus]]
good_sems = summary.loc['goodNews', [f'sem_{au}' for au in aus]]
# Organize them for plotting
fig, _ax = plt.subplots(figsize=(3, 4))
ind = np.arange(len(bad_means))
width = 0.35
rects1 = _ax.bar(ind - width / 2, bad_means, width, yerr=bad_sems, label='Bad News')
rects2 = _ax.bar(ind + width / 2, good_means, width, yerr=good_sems, label='Good News')
# Plot
_ax.set(ylabel='Average Probability', title='', xticks=ind, xticklabels=aus, ylim=(0, 1))
_ax.legend(loc='upper left', frameon=False, bbox_to_anchor=(0, 1.25))
plt.axhline(0.5, ls='--', color='k')
sns.despine()
plt.xticks(rotation=45)
plt.tight_layout()
fig

4.4 Comparing the condition difference across AUs using regression

One way we can compare what AUs in the plot show significant differences is by using the .regress() method along with numerical contrast codes. For example we can test the difference in activation of every AU when participants delivered good vs bad news.

This is analogous to the "mass-univariate" GLM approach in fMRI research, and allows us to identify what AUs are significantly more active in one condition vs another:

# Save the by_condition fex from above
by_condition = video_means.update_sessions(video2condition)
by_condition_codes = by_condition.update_sessions({'goodNews': 1.0, 'badNews': -1})
# We set numerical contrasts to compare mean good news > mean bad news
b, se, t, p, df, residuals = by_condition_codes.regress(X='sessions', y='aus', fit_intercept=True)
p_bonf = p / p.shape[1]
# Now we perform a regression (t-test) at every AU
_results = pd.concat([b.round(3).loc[['sessions']].rename(index={'sessions': 'betas'}), se.round(3).loc[['sessions']].rename(index={'sessions': 'ses'}), t.round(3).loc[['sessions']].rename(index={'sessions': 't-stats'}), df.round(3).loc[['sessions']].rename(index={'sessions': 'dof'}), p_bonf.round(3).loc[['sessions']].rename(index={'sessions': 'p-values'})])
_ax = _results.loc['betas'].plot(kind='bar', yerr=_results.loc['ses'], color=['steelblue' if elem else 'gray' for elem in _results.loc['p-values'] < 0.01], title='Good News > Bad News\n(blue: p < .01)')
xticks = _ax.get_xticklabels()
xticks = [elem.get_text().split('_')[-1] for elem in xticks]
# We can perform bonferroni correction for multiple comparisons:
_ax.set_xticklabels(xticks)
_ax.set_ylabel('Beta +/- SE')
sns.despine()
_ax.figure

4.5 Decoding condition from facial features

We can easily perform an analysis just like Watson et al, by training a LinearDiscriminantAnalysis (LDA) decoder to classify which condition a video came from based on average AU and headpose detections.

To do this we can use the .predict() which behaves just like .regress() but also requires a sklearn Estimator. We can use keyword arguments to perform 10-fold cross-validation to test the accuracy of each decoder:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
feature_list = ['emotions', 'aus', 'poses', 'emotions,poses', 'aus,poses']  # always a good idea to normalize your features!
_results = []
models = {}
# List of different models we'll train
for features in feature_list:
    model, accuracy = by_condition.predict(X=features, y='sessions', model=make_pipeline(StandardScaler(), LinearDiscriminantAnalysis()), cv_kwargs={'cv': 10})
    models[features] = model
    _results.append(pd.DataFrame({'Accuracy': accuracy * 100, 'Features': [features] * len(accuracy)}))
    print(f'{features} model accuracy: {accuracy.mean() * 100:.3g}% +/- {accuracy.std() * 100:.3g}%')
_results = pd.concat(_results).assign(Features=lambda df: df.Features.map({'emotions': 'Emotions', 'poses': 'Pose', 'aus': 'AUs', 'emotions,poses': 'Emotions\n+ Pose', 'aus,poses': 'AUs+Pose'}))
f, _ax = plt.subplots(1, 1, figsize=(3.75, 4))  # .predict is just like .regress, but this time session is our y.
_ax = sns.barplot(x='Features', y='Accuracy', errorbar='sd', dodge=False, hue='Features', data=_results, ax=_ax, order=['Emotions', 'Emotions\n+ Pose', 'AUs+Pose', 'AUs', 'Pose'])
_legend = _ax.get_legend()
if _legend is not None:
    _legend.remove()
_ax.set_title('Good News vs Bad News\nClassifier Performance')
_ax.set(ylabel='Accuracy', xlabel='')
sns.despine()
plt.axhline(y=50, ls='--', color='k')
plt.xticks(rotation=90)
plt.tight_layout()  # Save the model
# Concat results into a single dataframe and tweak column names
# Plot it
# with sns.plotting_context("talk", font_scale=1.8):
f
emotions model accuracy: 100% +/- 0%
aus model accuracy: 100% +/- 0%
poses model accuracy: 85% +/- 22.9%
emotions,poses model accuracy: 100% +/- 0%
aus,poses model accuracy: 95% +/- 15%

Visualizing decoder weights

Using what we learned in the previous tutorial, we can visualize the coefficients for any models that used AU features. This allows us to "see" the underlying facial expression that the classifier learned!

from feat.plotting import plot_face_mesh

# The 'aus' pipeline's LDA coefficients are signed weights over the 20 AUs.
# Min-max scale them into a positive AU-intensity range, then render the
# predicted 478-vertex mesh so we can "see" the (good-news-leaning) expression
# the classifier learned — the Detectorv2 analog of the 68-pt plot_face.
_coef = models['aus'][1].coef_.squeeze()
_au = (_coef - _coef.min()) / (np.ptp(_coef) + 1e-12) * 2.0

_fig = plt.figure(figsize=(5, 5))
_ax = _fig.add_subplot(111, projection="3d")
plot_face_mesh(au=_au, ax=_ax, mode="tesselation")
_ax.set_title("Expression reconstructed from\nAU classifier weights")
_fig

You can also animate this expression to emphasize what's changing — pass the same scaled AU-weight vector as end (and a neutral np.zeros(20) as start) to animate_face_mesh_plotly for an interactive 3D morph. See tutorial 3 for animation examples.

4.6 Time-series analysis

Finally we might be interested in looking the similarity of the detected features over time. We can do that using the .isc() method which takes a column and metric to use. Here we compare detected happiness between all pairs of videos.

We use some helper functions to cluster, sort, and plot the correlation matrix. Warmer colors indicate a pair of videos elicited more similar detected Happiness over time. We see that some videos show high-correlation in-terms of their detected happiness over-time. This is likely why the classifier above was able to decode conditions so well.

# ISC returns a video x video pearson correlation matrix.
# Detectorv2 emotion columns are capitalized ('Happy', not 'happiness').
isc = fex_1.isc(col='Happy', method='pearson')
def cluster_corrs(df):
    """Helper to reorder rows and cols of correlation matrix based on clustering"""
    import scipy.cluster.hierarchy as sch
    pairwise_distances = sch.distance.pdist(df)
    linkage = sch.linkage(pairwise_distances, method='complete')
    cluster_distance_threshold = pairwise_distances.max() / 2
    idx_to_cluster_array = sch.fcluster(linkage, cluster_distance_threshold, criterion='distance')
    idx = np.argsort(idx_to_cluster_array)
    return df.iloc[idx, :].T.iloc[idx, :]

def add_cond_to_ticks(ax):
    """Helper to add condition info to each tick label"""
    xlabels, ylabels = ([], [])
    for xlabel, ylabel in zip(_ax.get_xticklabels(), _ax.get_yticklabels()):
        x_condition = video2condition[xlabel.get_text()]
        y_condition = video2condition[ylabel.get_text()]
        x_new = f"{x_condition[:-4]}_{xlabel.get_text().split('.csv')[0][1:]}"
        y_new = f"{y_condition[:-4]}_{ylabel.get_text().split('.csv')[0][1:]}"
        xlabels.append(x_new)
        ylabels.append(y_new)
    _ax.set_xticklabels(xlabels)
    _ax.set_yticklabels(ylabels)
    return _ax
_ax = sns.heatmap(cluster_corrs(isc), cmap='RdBu_r', vmin=-1, vmax=1, square=True)
_ax = add_cond_to_ticks(_ax)
_ax.set(xlabel='', ylabel='', title='Inter-video Happiness\ntimeseries correlation')
# Plot it
_ax.figure