Open in molab View on GitHub Download .py

4. Running a full analysis¶

In this tutorial we'll perform a real analysis on part of the open dataset from "A Data-Driven Characterisation Of Natural Facial Expressions When Giving Good And Bad News" by Watson & Johnston 2020.

In the original paper the authors had 3 speakers deliver good or bad news while filming their facial expressions. They found that could accurately "decode" each condition based on participants' facial expressions extracted either using a custom multi-chanel-gradient model or action units (AUs) extracted using Open Face.

In this tutorial we'll show how easily it is to not only reproduce their decoding analysis with py-feat, but just as easily perform additional analyses. Specifically we'll:

Download 20 of the first subject's videos (the full dataset is available on OSF
Extract facial features using the Detectorv2
Aggregate and summarize detections per video using Fex
Train and test a decoder to classify good vs bad news using extracted emotions, AUs, and poses
Run a fMRI style "mass-univariate" comparison across all AUs between conditions
Run a time-series analysis comparing videos based on the time-courses of extracted facial features

4.1 The data¶

We'll analyze 20 short clips from the Watson & Johnston (2020) "good news vs bad news" dataset — 10 of each. To keep this tutorial fast and offline, the per-frame Detectorv2 outputs are pre-computed and bundled with the docs under docs/data/news_sample/, so we load them directly instead of downloading videos and re-running detection. The full dataset is on OSF.

from glob import glob
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sns.set_context("talk")

# Pre-computed per-frame Detectorv2 outputs (20 AUs + 7 emotions + valence/
# arousal + head pose + gaze + the 478-vertex mesh landmarks + 52 blendshapes)
# for the 20 bundled clips, generated with skip_frames=2 and committed under
# docs/data/news_sample/ so this tutorial runs offline.
data_dir = Path(__file__).resolve().parent.parent / "data" / "news_sample"
videos = [Path(c).stem + ".mp4" for c in sorted(glob(str(data_dir / "0*.csv")))]

clip_attrs = pd.read_csv(data_dir / "clip_attrs.csv").assign(
    input=lambda d: d.clipN.apply(lambda x: str(x).zfill(3) + ".mp4"),
    condition=lambda d: d["class"].replace({"gn": "goodNews", "ists": "badNews"}),
)
clip_attrs = clip_attrs.query("input in @videos")

print(
    f"{len(videos)} clips: "
    f"{(clip_attrs.condition == 'goodNews').sum()} good news, "
    f"{(clip_attrs.condition == 'badNews').sum()} bad news"
)

20 clips: 10 good news, 10 bad news

4.2 How the detections were generated¶

The bundled CSVs were produced by running the Detectorv2 over each clip and saving its per-frame output. You don't need to run this — we load the saved CSVs in the next section — but here's the code that made them:

from feat import Detectorv2

detector = Detectorv2(device="cuda")  # or "mps" / "cpu"
for video in videos:
    fex = detector.detect(video, data_type="video", batch_size=8, skip_frames=2)
    fex.to_csv(video.replace(".mp4", ".csv"), index=False)

read_feat recognizes these as Detectorv2 output (from their valence/arousal and 478-vertex mesh columns) and restores the matching .aus / .emotions / .poses / .gazes accessors.

4.3. Aggregate detections using a `Fex` dataframe¶

Then we can use read_feat to load each CSV file and concatenate them together:

from feat.utils.io import read_feat
fex_1 = pd.concat(
    read_feat(str(data_dir / video.replace(".mp4", ".csv"))) for video in videos
)
print(f'Unique videos: {fex_1.inputs.nunique()}')
print(f'Total processed frames: {fex_1.shape[0]}')
print(f"Avg frames per video: {fex_1.groupby('input').size().mean()}")

Unique videos: 20
Total processed frames: 477
Avg frames per video: 23.85

Our Fex dataframe now contains all detections for all frames of each video

fex_1.shape
fex_1.head()

	FaceRectX	FaceRectY	FaceRectWidth	FaceRectHeight	FaceScore	x_0	x_1	x_2	x_3	x_4	...	mouthUpperUpLeft	mouthUpperUpRight	noseSneerLeft	noseSneerRight	FrameHeight	FrameWidth	input	frame	approx_time	Identity
0	231.36029	213.90369	662.01550	662.01550	0.999989	359.36720	363.24620	368.41818	378.11566	393.63165	...	0.001648	0.003281	6.482005e-07	9.424984e-07	1080.0	1080.0	001.mp4	0	00:00	NaN
1	235.58954	216.35260	655.57166	655.57166	0.999989	361.07007	364.27110	370.03296	378.99585	394.36078	...	0.001030	0.002045	6.482005e-07	8.866191e-07	1080.0	1080.0	001.mp4	2	00:00	NaN
2	240.72534	214.76416	652.69850	652.69850	0.999988	363.74370	366.93073	372.02990	381.59094	396.25116	...	0.000296	0.000645	5.699694e-07	7.823110e-07	1080.0	1080.0	001.mp4	4	00:00	NaN
3	237.07724	208.92416	670.49060	670.49060	0.999989	365.41333	368.03244	373.27063	382.43750	397.49738	...	0.000418	0.000912	5.364418e-07	9.424984e-07	1080.0	1080.0	001.mp4	6	00:00	NaN
4	232.31665	211.43927	677.80920	677.80920	0.999990	365.36320	368.67280	374.63013	383.89703	399.78320	...	0.001869	0.004333	6.072223e-07	1.139939e-06	1080.0	1080.0	001.mp4	8	00:00	NaN

5 rows × 2183 columns

Summarize data with `Fex.sessions`¶

Fex dataframes have a special attribute called .sessions that act as a grouping factor to make it easier to compute summary statistics with any of the .extract_* methods. By default .sessions is None, but you can use the .update_sessions() to return a new Fex dataframe with .sessions set.

For example, if we update the sessions to be the name of each video, then .extract_mean() will group video-frames (rows) by video making it easy to compute a single summary statistic per file:

by_video = fex_1.update_sessions(fex_1['input'])
video_means = by_video.extract_mean()
# Compute the mean per video
video_means  # 20 rows for 20 videos

	mean_FaceRectX	mean_FaceRectY	mean_FaceRectWidth	mean_FaceRectHeight	mean_FaceScore	mean_x_0	mean_x_1	mean_x_2	mean_x_3	mean_x_4	...	mean_mouthStretchLeft	mean_mouthStretchRight	mean_mouthUpperUpLeft	mean_mouthUpperUpRight	mean_noseSneerLeft	mean_noseSneerRight	mean_FrameHeight	mean_FrameWidth	mean_frame	mean_Identity
001.mp4	228.067197	212.486615	666.696606	666.696595	0.999989	357.154249	360.562415	366.123327	375.685371	391.008434	...	0.000409	0.000852	0.001542	0.004202	4.327856e-07	7.288530e-07	1080.0	1080.0	19.0	NaN
002.mp4	224.596631	208.872836	667.766566	667.766551	0.999988	353.992130	355.834680	360.239261	368.648613	382.836786	...	0.001108	0.002166	0.000538	0.001035	6.315697e-07	5.405662e-07	1080.0	1080.0	13.0	NaN
003.mp4	220.553640	208.426700	661.469404	661.469403	0.999989	346.443812	348.356508	352.717816	360.550875	374.062050	...	0.000416	0.001272	0.014607	0.023773	1.014676e-06	7.956599e-07	1080.0	1080.0	23.0	NaN
004.mp4	206.602689	214.005486	662.983170	662.983169	0.999990	333.571429	336.693899	342.125789	351.242905	365.876890	...	0.000757	0.001732	0.030648	0.056126	1.045713e-06	1.636941e-06	1080.0	1080.0	22.0	NaN
005.mp4	189.311367	226.164585	669.622540	669.622549	0.999990	318.945233	322.496384	328.040360	337.220817	352.571631	...	0.001041	0.001985	0.041066	0.070675	6.294121e-07	2.383133e-06	1080.0	1080.0	22.0	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
016.mp4	188.534338	233.985227	641.555135	641.555143	0.999982	304.692316	310.089139	318.195942	330.912421	349.278520	...	0.000419	0.002421	0.000148	0.000181	5.916525e-07	2.145027e-07	1080.0	1080.0	38.0	NaN
017.mp4	200.779778	232.391727	647.094319	647.094325	0.999986	325.604824	331.531583	338.546530	349.374781	364.318932	...	0.000806	0.001480	0.006711	0.014338	1.050725e-06	4.987072e-07	1080.0	1080.0	28.0	NaN
018.mp4	225.480577	238.637433	643.314181	643.314153	0.999982	344.096223	347.055058	352.852808	363.068252	379.061245	...	0.000268	0.002367	0.001208	0.001664	4.587626e-07	5.929765e-07	1080.0	1080.0	26.0	NaN
019.mp4	190.195110	227.817021	649.331265	649.331274	0.999985	308.581546	311.671701	317.121919	326.855916	341.824698	...	0.000631	0.001828	0.002142	0.004235	8.648494e-07	3.698951e-07	1080.0	1080.0	31.0	NaN
020.mp4	174.576909	218.228273	646.920015	646.920038	0.999986	294.347939	300.894758	309.029580	321.320305	338.416800	...	0.000610	0.001497	0.012456	0.021124	1.044549e-06	4.109531e-07	1080.0	1080.0	32.0	NaN

20 rows × 2181 columns

Then we can grab the AU detections and call standard pandas methods like .loc and .plot:

# Grab the aus just for video 1
video001_aus = video_means.aus.loc['001.mp4']
_ax = video001_aus.plot(kind='bar', title='Video 001 AU detection')
# Plot them
_ax.set(ylabel='Average Probability')
sns.despine()
_ax.figure

Chaining operations¶

.update_sessions() always returns a copy of the Fex object, so that you can chain operations together including existing pandas methods like .plot(). Here's an example passing a dictionary to .update_sessions(), which maps between old and new session names:

# Which condition each video belonged to
video2condition = dict(zip(clip_attrs['input'], clip_attrs['condition']))
_ax = by_video.update_sessions(video2condition).extract_mean().aus.plot(kind='bar', legend=False, title='Mean AU detection by condition')
_ax.set(ylabel='Average Probability', title='AU detection by condition', xticklabels=['Good News', 'Bad News'])  # if loading pre-computed csv
plt.xticks(rotation=0)  # clip_attrs["input"].str.replace(".mp4", ".csv", regex=False),
# Update sessions to group by condition, compute means (per condition), and make a
# barplot of the mean AUs for each condition
sns.despine()
_ax.figure

We can also focus in on the AUs associated with happiness:

aus = ['AU06', 'AU12', 'AU25']  # from https://py-feat.org/pages/au_reference.html
summary = by_video.update_sessions(video2condition).extract_summary(mean=True, sem=True, std=False, min=False, max=False)
# Update the sessions to condition compute summary stats
bad_means = summary.loc['badNews', [f'mean_{au}' for au in aus]]
bad_sems = summary.loc['badNews', [f'sem_{au}' for au in aus]]
good_means = summary.loc['goodNews', [f'mean_{au}' for au in aus]]
good_sems = summary.loc['goodNews', [f'sem_{au}' for au in aus]]
# Organize them for plotting
fig, _ax = plt.subplots(figsize=(3, 4))
ind = np.arange(len(bad_means))
width = 0.35
rects1 = _ax.bar(ind - width / 2, bad_means, width, yerr=bad_sems, label='Bad News')
rects2 = _ax.bar(ind + width / 2, good_means, width, yerr=good_sems, label='Good News')
# Plot
_ax.set(ylabel='Average Probability', title='', xticks=ind, xticklabels=aus, ylim=(0, 1))
_ax.legend(loc='upper left', frameon=False, bbox_to_anchor=(0, 1.25))
plt.axhline(0.5, ls='--', color='k')
sns.despine()
plt.xticks(rotation=45)
plt.tight_layout()
fig

4.4 Comparing the condition difference across AUs using regression¶

One way we can compare what AUs in the plot show significant differences is by using the .regress() method along with numerical contrast codes. For example we can test the difference in activation of every AU when participants delivered good vs bad news.

This is analogous to the "mass-univariate" GLM approach in fMRI research, and allows us to identify what AUs are significantly more active in one condition vs another:

# Save the by_condition fex from above
by_condition = video_means.update_sessions(video2condition)
by_condition_codes = by_condition.update_sessions({'goodNews': 1.0, 'badNews': -1})
# We set numerical contrasts to compare mean good news > mean bad news
b, se, t, p, df, residuals = by_condition_codes.regress(X='sessions', y='aus', fit_intercept=True)
p_bonf = p / p.shape[1]
# Now we perform a regression (t-test) at every AU
_results = pd.concat([b.round(3).loc[['sessions']].rename(index={'sessions': 'betas'}), se.round(3).loc[['sessions']].rename(index={'sessions': 'ses'}), t.round(3).loc[['sessions']].rename(index={'sessions': 't-stats'}), df.round(3).loc[['sessions']].rename(index={'sessions': 'dof'}), p_bonf.round(3).loc[['sessions']].rename(index={'sessions': 'p-values'})])
_ax = _results.loc['betas'].plot(kind='bar', yerr=_results.loc['ses'], color=['steelblue' if elem else 'gray' for elem in _results.loc['p-values'] < 0.01], title='Good News > Bad News\n(blue: p < .01)')
xticks = _ax.get_xticklabels()
xticks = [elem.get_text().split('_')[-1] for elem in xticks]
# We can perform bonferroni correction for multiple comparisons:
_ax.set_xticklabels(xticks)
_ax.set_ylabel('Beta +/- SE')
sns.despine()
_ax.figure

4.5 Decoding condition from facial features¶

We can easily perform an analysis just like Watson et al, by training a LinearDiscriminantAnalysis (LDA) decoder to classify which condition a video came from based on average AU and headpose detections.

To do this we can use the .predict() which behaves just like .regress() but also requires a sklearn Estimator. We can use keyword arguments to perform 10-fold cross-validation to test the accuracy of each decoder:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
feature_list = ['emotions', 'aus', 'poses', 'emotions,poses', 'aus,poses']  # always a good idea to normalize your features!
_results = []
models = {}
# List of different models we'll train
for features in feature_list:
    model, accuracy = by_condition.predict(X=features, y='sessions', model=make_pipeline(StandardScaler(), LinearDiscriminantAnalysis()), cv_kwargs={'cv': 10})
    models[features] = model
    _results.append(pd.DataFrame({'Accuracy': accuracy * 100, 'Features': [features] * len(accuracy)}))
    print(f'{features} model accuracy: {accuracy.mean() * 100:.3g}% +/- {accuracy.std() * 100:.3g}%')
_results = pd.concat(_results).assign(Features=lambda df: df.Features.map({'emotions': 'Emotions', 'poses': 'Pose', 'aus': 'AUs', 'emotions,poses': 'Emotions\n+ Pose', 'aus,poses': 'AUs+Pose'}))
f, _ax = plt.subplots(1, 1, figsize=(3.75, 4))  # .predict is just like .regress, but this time session is our y.
_ax = sns.barplot(x='Features', y='Accuracy', errorbar='sd', dodge=False, hue='Features', data=_results, ax=_ax, order=['Emotions', 'Emotions\n+ Pose', 'AUs+Pose', 'AUs', 'Pose'])
_legend = _ax.get_legend()
if _legend is not None:
    _legend.remove()
_ax.set_title('Good News vs Bad News\nClassifier Performance')
_ax.set(ylabel='Accuracy', xlabel='')
sns.despine()
plt.axhline(y=50, ls='--', color='k')
plt.xticks(rotation=90)
plt.tight_layout()  # Save the model
# Concat results into a single dataframe and tweak column names
# Plot it
# with sns.plotting_context("talk", font_scale=1.8):
f

emotions model accuracy: 100% +/- 0%
aus model accuracy: 100% +/- 0%
poses model accuracy: 85% +/- 22.9%
emotions,poses model accuracy: 100% +/- 0%
aus,poses model accuracy: 95% +/- 15%

Visualizing decoder weights¶

Using what we learned in the previous tutorial, we can visualize the coefficients for any models that used AU features. This allows us to "see" the underlying facial expression that the classifier learned!

from feat.plotting import plot_face_mesh

# The 'aus' pipeline's LDA coefficients are signed weights over the 20 AUs.
# Min-max scale them into a positive AU-intensity range, then render the
# predicted 478-vertex mesh so we can "see" the (good-news-leaning) expression
# the classifier learned — the Detectorv2 analog of the 68-pt plot_face.
_coef = models['aus'][1].coef_.squeeze()
_au = (_coef - _coef.min()) / (np.ptp(_coef) + 1e-12) * 2.0

_fig = plt.figure(figsize=(5, 5))
_ax = _fig.add_subplot(111, projection="3d")
plot_face_mesh(au=_au, ax=_ax, mode="tesselation")
_ax.set_title("Expression reconstructed from\nAU classifier weights")
_fig

You can also animate this expression to emphasize what's changing — pass the same scaled AU-weight vector as end (and a neutral np.zeros(20) as start) to animate_face_mesh_plotly for an interactive 3D morph. See tutorial 3 for animation examples.

4.6 Time-series analysis¶

Finally we might be interested in looking the similarity of the detected features over time. We can do that using the .isc() method which takes a column and metric to use. Here we compare detected happiness between all pairs of videos.

We use some helper functions to cluster, sort, and plot the correlation matrix. Warmer colors indicate a pair of videos elicited more similar detected Happiness over time. We see that some videos show high-correlation in-terms of their detected happiness over-time. This is likely why the classifier above was able to decode conditions so well.

# ISC returns a video x video pearson correlation matrix.
# Detectorv2 emotion columns are capitalized ('Happy', not 'happiness').
isc = fex_1.isc(col='Happy', method='pearson')

def cluster_corrs(df):
    """Helper to reorder rows and cols of correlation matrix based on clustering"""
    import scipy.cluster.hierarchy as sch
    pairwise_distances = sch.distance.pdist(df)
    linkage = sch.linkage(pairwise_distances, method='complete')
    cluster_distance_threshold = pairwise_distances.max() / 2
    idx_to_cluster_array = sch.fcluster(linkage, cluster_distance_threshold, criterion='distance')
    idx = np.argsort(idx_to_cluster_array)
    return df.iloc[idx, :].T.iloc[idx, :]

def add_cond_to_ticks(ax):
    """Helper to add condition info to each tick label"""
    xlabels, ylabels = ([], [])
    for xlabel, ylabel in zip(_ax.get_xticklabels(), _ax.get_yticklabels()):
        x_condition = video2condition[xlabel.get_text()]
        y_condition = video2condition[ylabel.get_text()]
        x_new = f"{x_condition[:-4]}_{xlabel.get_text().split('.csv')[0][1:]}"
        y_new = f"{y_condition[:-4]}_{ylabel.get_text().split('.csv')[0][1:]}"
        xlabels.append(x_new)
        ylabels.append(y_new)
    _ax.set_xticklabels(xlabels)
    _ax.set_yticklabels(ylabels)
    return _ax
_ax = sns.heatmap(cluster_corrs(isc), cmap='RdBu_r', vmin=-1, vmax=1, square=True)
_ax = add_cond_to_ticks(_ax)
_ax.set(xlabel='', ylabel='', title='Inter-video Happiness\ntimeseries correlation')
# Plot it
_ax.figure