4. Running a full analysis¶
In this tutorial we'll perform a real analysis on part of the open dataset from "A Data-Driven Characterisation Of Natural Facial Expressions When Giving Good And Bad News" by Watson & Johnston 2020.
In the original paper the authors had 3 speakers deliver good or bad news while filming their facial expressions. They found that could accurately "decode" each condition based on participants' facial expressions extracted either using a custom multi-chanel-gradient model or action units (AUs) extracted using Open Face.
In this tutorial we'll show how easily it is to not only reproduce their decoding analysis with py-feat, but just as easily perform additional analyses. Specifically we'll:
- Download 20 of the first subject's videos (the full dataset is available on OSF
- Extract facial features using the
Detectorv2 - Aggregate and summarize detections per video using
Fex - Train and test a decoder to classify good vs bad news using extracted emotions, AUs, and poses
- Run a fMRI style "mass-univariate" comparison across all AUs between conditions
- Run a time-series analysis comparing videos based on the time-courses of extracted facial features
4.1 The data¶
We'll analyze 20 short clips from the Watson & Johnston (2020) "good news vs bad news" dataset — 10 of each. To keep this tutorial fast and offline, the per-frame Detectorv2 outputs are pre-computed and bundled with the docs under docs/data/news_sample/, so we load them directly instead of downloading videos and re-running detection. The full dataset is on OSF.
from glob import glob
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_context("talk")
# Pre-computed per-frame Detectorv2 outputs (20 AUs + 7 emotions + valence/
# arousal + head pose + gaze + the 478-vertex mesh landmarks + 52 blendshapes)
# for the 20 bundled clips, generated with skip_frames=2 and committed under
# docs/data/news_sample/ so this tutorial runs offline.
data_dir = Path(__file__).resolve().parent.parent / "data" / "news_sample"
videos = [Path(c).stem + ".mp4" for c in sorted(glob(str(data_dir / "0*.csv")))]
clip_attrs = pd.read_csv(data_dir / "clip_attrs.csv").assign(
input=lambda d: d.clipN.apply(lambda x: str(x).zfill(3) + ".mp4"),
condition=lambda d: d["class"].replace({"gn": "goodNews", "ists": "badNews"}),
)
clip_attrs = clip_attrs.query("input in @videos")
print(
f"{len(videos)} clips: "
f"{(clip_attrs.condition == 'goodNews').sum()} good news, "
f"{(clip_attrs.condition == 'badNews').sum()} bad news"
)
20 clips: 10 good news, 10 bad news
4.2 How the detections were generated¶
The bundled CSVs were produced by running the Detectorv2 over each clip and saving its per-frame output. You don't need to run this — we load the saved CSVs in the next section — but here's the code that made them:
from feat import Detectorv2
detector = Detectorv2(device="cuda") # or "mps" / "cpu"
for video in videos:
fex = detector.detect(video, data_type="video", batch_size=8, skip_frames=2)
fex.to_csv(video.replace(".mp4", ".csv"), index=False)
read_feat recognizes these as Detectorv2 output (from their valence/arousal
and 478-vertex mesh columns) and restores the matching .aus / .emotions /
.poses / .gazes accessors.
4.3. Aggregate detections using a Fex dataframe¶
Then we can use read_feat to load each CSV file and concatenate them together:
from feat.utils.io import read_feat
fex_1 = pd.concat(
read_feat(str(data_dir / video.replace(".mp4", ".csv"))) for video in videos
)
print(f'Unique videos: {fex_1.inputs.nunique()}')
print(f'Total processed frames: {fex_1.shape[0]}')
print(f"Avg frames per video: {fex_1.groupby('input').size().mean()}")
Unique videos: 20 Total processed frames: 477 Avg frames per video: 23.85
Our Fex dataframe now contains all detections for all frames of each video
| FaceRectX | FaceRectY | FaceRectWidth | FaceRectHeight | FaceScore | x_0 | x_1 | x_2 | x_3 | x_4 | ... | mouthUpperUpLeft | mouthUpperUpRight | noseSneerLeft | noseSneerRight | FrameHeight | FrameWidth | input | frame | approx_time | Identity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 231.36029 | 213.90369 | 662.01550 | 662.01550 | 0.999989 | 359.36720 | 363.24620 | 368.41818 | 378.11566 | 393.63165 | ... | 0.001648 | 0.003281 | 6.482005e-07 | 9.424984e-07 | 1080.0 | 1080.0 | 001.mp4 | 0 | 00:00 | NaN |
| 1 | 235.58954 | 216.35260 | 655.57166 | 655.57166 | 0.999989 | 361.07007 | 364.27110 | 370.03296 | 378.99585 | 394.36078 | ... | 0.001030 | 0.002045 | 6.482005e-07 | 8.866191e-07 | 1080.0 | 1080.0 | 001.mp4 | 2 | 00:00 | NaN |
| 2 | 240.72534 | 214.76416 | 652.69850 | 652.69850 | 0.999988 | 363.74370 | 366.93073 | 372.02990 | 381.59094 | 396.25116 | ... | 0.000296 | 0.000645 | 5.699694e-07 | 7.823110e-07 | 1080.0 | 1080.0 | 001.mp4 | 4 | 00:00 | NaN |
| 3 | 237.07724 | 208.92416 | 670.49060 | 670.49060 | 0.999989 | 365.41333 | 368.03244 | 373.27063 | 382.43750 | 397.49738 | ... | 0.000418 | 0.000912 | 5.364418e-07 | 9.424984e-07 | 1080.0 | 1080.0 | 001.mp4 | 6 | 00:00 | NaN |
| 4 | 232.31665 | 211.43927 | 677.80920 | 677.80920 | 0.999990 | 365.36320 | 368.67280 | 374.63013 | 383.89703 | 399.78320 | ... | 0.001869 | 0.004333 | 6.072223e-07 | 1.139939e-06 | 1080.0 | 1080.0 | 001.mp4 | 8 | 00:00 | NaN |
5 rows × 2183 columns
Summarize data with Fex.sessions¶
Fex dataframes have a special attribute called .sessions that act as a grouping factor to make it easier to compute summary statistics with any of the .extract_* methods. By default .sessions is None, but you can use the .update_sessions() to return a new Fex dataframe with .sessions set.
For example, if we update the sessions to be the name of each video, then .extract_mean() will group video-frames (rows) by video making it easy to compute a single summary statistic per file:
by_video = fex_1.update_sessions(fex_1['input'])
video_means = by_video.extract_mean()
# Compute the mean per video
video_means # 20 rows for 20 videos
| mean_FaceRectX | mean_FaceRectY | mean_FaceRectWidth | mean_FaceRectHeight | mean_FaceScore | mean_x_0 | mean_x_1 | mean_x_2 | mean_x_3 | mean_x_4 | ... | mean_mouthStretchLeft | mean_mouthStretchRight | mean_mouthUpperUpLeft | mean_mouthUpperUpRight | mean_noseSneerLeft | mean_noseSneerRight | mean_FrameHeight | mean_FrameWidth | mean_frame | mean_Identity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 001.mp4 | 228.067197 | 212.486615 | 666.696606 | 666.696595 | 0.999989 | 357.154249 | 360.562415 | 366.123327 | 375.685371 | 391.008434 | ... | 0.000409 | 0.000852 | 0.001542 | 0.004202 | 4.327856e-07 | 7.288530e-07 | 1080.0 | 1080.0 | 19.0 | NaN |
| 002.mp4 | 224.596631 | 208.872836 | 667.766566 | 667.766551 | 0.999988 | 353.992130 | 355.834680 | 360.239261 | 368.648613 | 382.836786 | ... | 0.001108 | 0.002166 | 0.000538 | 0.001035 | 6.315697e-07 | 5.405662e-07 | 1080.0 | 1080.0 | 13.0 | NaN |
| 003.mp4 | 220.553640 | 208.426700 | 661.469404 | 661.469403 | 0.999989 | 346.443812 | 348.356508 | 352.717816 | 360.550875 | 374.062050 | ... | 0.000416 | 0.001272 | 0.014607 | 0.023773 | 1.014676e-06 | 7.956599e-07 | 1080.0 | 1080.0 | 23.0 | NaN |
| 004.mp4 | 206.602689 | 214.005486 | 662.983170 | 662.983169 | 0.999990 | 333.571429 | 336.693899 | 342.125789 | 351.242905 | 365.876890 | ... | 0.000757 | 0.001732 | 0.030648 | 0.056126 | 1.045713e-06 | 1.636941e-06 | 1080.0 | 1080.0 | 22.0 | NaN |
| 005.mp4 | 189.311367 | 226.164585 | 669.622540 | 669.622549 | 0.999990 | 318.945233 | 322.496384 | 328.040360 | 337.220817 | 352.571631 | ... | 0.001041 | 0.001985 | 0.041066 | 0.070675 | 6.294121e-07 | 2.383133e-06 | 1080.0 | 1080.0 | 22.0 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 016.mp4 | 188.534338 | 233.985227 | 641.555135 | 641.555143 | 0.999982 | 304.692316 | 310.089139 | 318.195942 | 330.912421 | 349.278520 | ... | 0.000419 | 0.002421 | 0.000148 | 0.000181 | 5.916525e-07 | 2.145027e-07 | 1080.0 | 1080.0 | 38.0 | NaN |
| 017.mp4 | 200.779778 | 232.391727 | 647.094319 | 647.094325 | 0.999986 | 325.604824 | 331.531583 | 338.546530 | 349.374781 | 364.318932 | ... | 0.000806 | 0.001480 | 0.006711 | 0.014338 | 1.050725e-06 | 4.987072e-07 | 1080.0 | 1080.0 | 28.0 | NaN |
| 018.mp4 | 225.480577 | 238.637433 | 643.314181 | 643.314153 | 0.999982 | 344.096223 | 347.055058 | 352.852808 | 363.068252 | 379.061245 | ... | 0.000268 | 0.002367 | 0.001208 | 0.001664 | 4.587626e-07 | 5.929765e-07 | 1080.0 | 1080.0 | 26.0 | NaN |
| 019.mp4 | 190.195110 | 227.817021 | 649.331265 | 649.331274 | 0.999985 | 308.581546 | 311.671701 | 317.121919 | 326.855916 | 341.824698 | ... | 0.000631 | 0.001828 | 0.002142 | 0.004235 | 8.648494e-07 | 3.698951e-07 | 1080.0 | 1080.0 | 31.0 | NaN |
| 020.mp4 | 174.576909 | 218.228273 | 646.920015 | 646.920038 | 0.999986 | 294.347939 | 300.894758 | 309.029580 | 321.320305 | 338.416800 | ... | 0.000610 | 0.001497 | 0.012456 | 0.021124 | 1.044549e-06 | 4.109531e-07 | 1080.0 | 1080.0 | 32.0 | NaN |
20 rows × 2181 columns
Then we can grab the AU detections and call standard pandas methods like .loc and .plot:
# Grab the aus just for video 1
video001_aus = video_means.aus.loc['001.mp4']
_ax = video001_aus.plot(kind='bar', title='Video 001 AU detection')
# Plot them
_ax.set(ylabel='Average Probability')
sns.despine()
_ax.figure
Chaining operations¶
.update_sessions() always returns a copy of the Fex object, so that you can chain operations together including existing pandas methods like .plot(). Here's an example passing a dictionary to .update_sessions(), which maps between old and new session names:
# Which condition each video belonged to
video2condition = dict(zip(clip_attrs['input'], clip_attrs['condition']))
_ax = by_video.update_sessions(video2condition).extract_mean().aus.plot(kind='bar', legend=False, title='Mean AU detection by condition')
_ax.set(ylabel='Average Probability', title='AU detection by condition', xticklabels=['Good News', 'Bad News']) # if loading pre-computed csv
plt.xticks(rotation=0) # clip_attrs["input"].str.replace(".mp4", ".csv", regex=False),
# Update sessions to group by condition, compute means (per condition), and make a
# barplot of the mean AUs for each condition
sns.despine()
_ax.figure
We can also focus in on the AUs associated with happiness:
aus = ['AU06', 'AU12', 'AU25'] # from https://py-feat.org/pages/au_reference.html
summary = by_video.update_sessions(video2condition).extract_summary(mean=True, sem=True, std=False, min=False, max=False)
# Update the sessions to condition compute summary stats
bad_means = summary.loc['badNews', [f'mean_{au}' for au in aus]]
bad_sems = summary.loc['badNews', [f'sem_{au}' for au in aus]]
good_means = summary.loc['goodNews', [f'mean_{au}' for au in aus]]
good_sems = summary.loc['goodNews', [f'sem_{au}' for au in aus]]
# Organize them for plotting
fig, _ax = plt.subplots(figsize=(3, 4))
ind = np.arange(len(bad_means))
width = 0.35
rects1 = _ax.bar(ind - width / 2, bad_means, width, yerr=bad_sems, label='Bad News')
rects2 = _ax.bar(ind + width / 2, good_means, width, yerr=good_sems, label='Good News')
# Plot
_ax.set(ylabel='Average Probability', title='', xticks=ind, xticklabels=aus, ylim=(0, 1))
_ax.legend(loc='upper left', frameon=False, bbox_to_anchor=(0, 1.25))
plt.axhline(0.5, ls='--', color='k')
sns.despine()
plt.xticks(rotation=45)
plt.tight_layout()
fig
4.4 Comparing the condition difference across AUs using regression¶
One way we can compare what AUs in the plot show significant differences is by using the .regress() method along with numerical contrast codes. For example we can test the difference in activation of every AU when participants delivered good vs bad news.
This is analogous to the "mass-univariate" GLM approach in fMRI research, and allows us to identify what AUs are significantly more active in one condition vs another:
# Save the by_condition fex from above
by_condition = video_means.update_sessions(video2condition)
by_condition_codes = by_condition.update_sessions({'goodNews': 1.0, 'badNews': -1})
# We set numerical contrasts to compare mean good news > mean bad news
b, se, t, p, df, residuals = by_condition_codes.regress(X='sessions', y='aus', fit_intercept=True)
p_bonf = p / p.shape[1]
# Now we perform a regression (t-test) at every AU
_results = pd.concat([b.round(3).loc[['sessions']].rename(index={'sessions': 'betas'}), se.round(3).loc[['sessions']].rename(index={'sessions': 'ses'}), t.round(3).loc[['sessions']].rename(index={'sessions': 't-stats'}), df.round(3).loc[['sessions']].rename(index={'sessions': 'dof'}), p_bonf.round(3).loc[['sessions']].rename(index={'sessions': 'p-values'})])
_ax = _results.loc['betas'].plot(kind='bar', yerr=_results.loc['ses'], color=['steelblue' if elem else 'gray' for elem in _results.loc['p-values'] < 0.01], title='Good News > Bad News\n(blue: p < .01)')
xticks = _ax.get_xticklabels()
xticks = [elem.get_text().split('_')[-1] for elem in xticks]
# We can perform bonferroni correction for multiple comparisons:
_ax.set_xticklabels(xticks)
_ax.set_ylabel('Beta +/- SE')
sns.despine()
_ax.figure
4.5 Decoding condition from facial features¶
We can easily perform an analysis just like Watson et al, by training a LinearDiscriminantAnalysis (LDA) decoder to classify which condition a video came from based on average AU and headpose detections.
To do this we can use the .predict() which behaves just like .regress() but also requires a sklearn Estimator. We can use keyword arguments to perform 10-fold cross-validation to test the accuracy of each decoder:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
feature_list = ['emotions', 'aus', 'poses', 'emotions,poses', 'aus,poses'] # always a good idea to normalize your features!
_results = []
models = {}
# List of different models we'll train
for features in feature_list:
model, accuracy = by_condition.predict(X=features, y='sessions', model=make_pipeline(StandardScaler(), LinearDiscriminantAnalysis()), cv_kwargs={'cv': 10})
models[features] = model
_results.append(pd.DataFrame({'Accuracy': accuracy * 100, 'Features': [features] * len(accuracy)}))
print(f'{features} model accuracy: {accuracy.mean() * 100:.3g}% +/- {accuracy.std() * 100:.3g}%')
_results = pd.concat(_results).assign(Features=lambda df: df.Features.map({'emotions': 'Emotions', 'poses': 'Pose', 'aus': 'AUs', 'emotions,poses': 'Emotions\n+ Pose', 'aus,poses': 'AUs+Pose'}))
f, _ax = plt.subplots(1, 1, figsize=(3.75, 4)) # .predict is just like .regress, but this time session is our y.
_ax = sns.barplot(x='Features', y='Accuracy', errorbar='sd', dodge=False, hue='Features', data=_results, ax=_ax, order=['Emotions', 'Emotions\n+ Pose', 'AUs+Pose', 'AUs', 'Pose'])
_legend = _ax.get_legend()
if _legend is not None:
_legend.remove()
_ax.set_title('Good News vs Bad News\nClassifier Performance')
_ax.set(ylabel='Accuracy', xlabel='')
sns.despine()
plt.axhline(y=50, ls='--', color='k')
plt.xticks(rotation=90)
plt.tight_layout() # Save the model
# Concat results into a single dataframe and tweak column names
# Plot it
# with sns.plotting_context("talk", font_scale=1.8):
f
emotions model accuracy: 100% +/- 0% aus model accuracy: 100% +/- 0% poses model accuracy: 85% +/- 22.9% emotions,poses model accuracy: 100% +/- 0% aus,poses model accuracy: 95% +/- 15%
Visualizing decoder weights¶
Using what we learned in the previous tutorial, we can visualize the coefficients for any models that used AU features. This allows us to "see" the underlying facial expression that the classifier learned!
from feat.plotting import plot_face_mesh
# The 'aus' pipeline's LDA coefficients are signed weights over the 20 AUs.
# Min-max scale them into a positive AU-intensity range, then render the
# predicted 478-vertex mesh so we can "see" the (good-news-leaning) expression
# the classifier learned — the Detectorv2 analog of the 68-pt plot_face.
_coef = models['aus'][1].coef_.squeeze()
_au = (_coef - _coef.min()) / (np.ptp(_coef) + 1e-12) * 2.0
_fig = plt.figure(figsize=(5, 5))
_ax = _fig.add_subplot(111, projection="3d")
plot_face_mesh(au=_au, ax=_ax, mode="tesselation")
_ax.set_title("Expression reconstructed from\nAU classifier weights")
_fig
You can also animate this expression to emphasize what's changing — pass the
same scaled AU-weight vector as end (and a neutral np.zeros(20) as start)
to animate_face_mesh_plotly for an interactive 3D morph. See
tutorial 3 for animation examples.
4.6 Time-series analysis¶
Finally we might be interested in looking the similarity of the detected features over time. We can do that using the .isc() method which takes a column and metric to use. Here we compare detected happiness between all pairs of videos.
We use some helper functions to cluster, sort, and plot the correlation matrix. Warmer colors indicate a pair of videos elicited more similar detected Happiness over time. We see that some videos show high-correlation in-terms of their detected happiness over-time. This is likely why the classifier above was able to decode conditions so well.
# ISC returns a video x video pearson correlation matrix.
# Detectorv2 emotion columns are capitalized ('Happy', not 'happiness').
isc = fex_1.isc(col='Happy', method='pearson')
def cluster_corrs(df):
"""Helper to reorder rows and cols of correlation matrix based on clustering"""
import scipy.cluster.hierarchy as sch
pairwise_distances = sch.distance.pdist(df)
linkage = sch.linkage(pairwise_distances, method='complete')
cluster_distance_threshold = pairwise_distances.max() / 2
idx_to_cluster_array = sch.fcluster(linkage, cluster_distance_threshold, criterion='distance')
idx = np.argsort(idx_to_cluster_array)
return df.iloc[idx, :].T.iloc[idx, :]
def add_cond_to_ticks(ax):
"""Helper to add condition info to each tick label"""
xlabels, ylabels = ([], [])
for xlabel, ylabel in zip(_ax.get_xticklabels(), _ax.get_yticklabels()):
x_condition = video2condition[xlabel.get_text()]
y_condition = video2condition[ylabel.get_text()]
x_new = f"{x_condition[:-4]}_{xlabel.get_text().split('.csv')[0][1:]}"
y_new = f"{y_condition[:-4]}_{ylabel.get_text().split('.csv')[0][1:]}"
xlabels.append(x_new)
ylabels.append(y_new)
_ax.set_xticklabels(xlabels)
_ax.set_yticklabels(ylabels)
return _ax
_ax = sns.heatmap(cluster_corrs(isc), cmap='RdBu_r', vmin=-1, vmax=1, square=True)
_ax = add_cond_to_ticks(_ax)
_ax.set(xlabel='', ylabel='', title='Inter-video Happiness\ntimeseries correlation')
# Plot it
_ax.figure