stats¶
feat.utils.stats
¶
Feat utility and helper functions for performing statistics.
calc_hist_auc(vals, hist_range=None)
¶
Calculate histogram area under the curve.
This function follows the bag of temporal feature analysis as described in Bartlett, M. S., Littlewort, G. C., Frank, M. G., & Lee, K. (2014). Automatic decoding of facial movements reveals deceptive pain expressions. Current Biology, 24(7), 738-743. The function receives convolved data, squares the values, finds 0 crossings to calculate the AUC(area under the curve) and generates a 6 exponentially-spaced-bin histogram for each data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vals
|
|
required |
Returns:
| Type | Description |
|---|---|
|
Series of histograms |
Source code in feat/utils/stats.py
clean_signal(signals, *, detrend=True, standardize=True, confounds=None, low_pass=None, high_pass=None, ensure_finite=False, sampling_freq=1.0, runs=None)
¶
Clean a 2D time-series signal: detrend, filter, regress confounds, standardize.
Drop-in replacement for the parts of nilearn.signal.clean that
Fex.clean uses, so py-feat can avoid taking on nilearn (and its
transitive nibabel/joblib/sklearn deps) just for time-series cleanup.
Operations are applied in nilearn's order:
1. Detrend (linear, optional)
2. Butterworth low/high/bandpass filter (optional, uses filtfilt)
3. Regress out confounds (optional). Confounds are filtered with the
same Butterworth before regression so the filter and confound-
removal operators stay orthogonal (Lindquist et al. 2018).
4. Standardize (zero-mean unit-variance, optional)
5. Replace NaN/Inf with zero (optional)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signals
|
|
required | |
detrend
|
subtract a linear trend from each column. |
True
|
|
standardize
|
rescale each column to zero-mean unit-variance
(using sample std, |
True
|
|
confounds
|
optional |
None
|
|
low_pass
|
low-pass cutoff in Hz (Butterworth, order 5). |
None
|
|
high_pass
|
high-pass cutoff in Hz. |
None
|
|
ensure_finite
|
replace NaN/Inf with zero in the output. |
False
|
|
sampling_freq
|
sampling rate in Hz (used for filter design). |
1.0
|
|
runs
|
optional 1-D label array. If given, each unique label is cleaned independently and the segments are concatenated back in original order. |
None
|
Returns:
| Type | Description |
|---|---|
|
|
Source code in feat/utils/stats.py
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 | |
cluster_identities(face_embeddings, threshold=0.8, method='gallery', min_cluster_size=2, chunk_size=4096)
¶
Cluster face identities from their embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
face_embeddings
|
|
required | |
threshold
|
cosine-similarity cutoff for the same person ( |
0.8
|
|
method
|
|
'gallery'
|
|
min_cluster_size
|
minimum cluster size for |
2
|
|
chunk_size
|
int
|
matmul block size for |
4096
|
Returns:
| Type | Description |
|---|---|
|
list of length |
|
|
|
Source code in feat/utils/stats.py
downsample(data, sampling_freq, target, target_type='samples', method='mean')
¶
Block-aggregate downsample a DataFrame's rows.
Drop-in replacement for nltools.stats.downsample. target is
interpreted by target_type:
'samples'(default, matches nltools):targetis the number of consecutive rows aggregated per output row.'seconds':targetis the duration of each output bin in seconds; bin size in samples =round(target * sampling_freq).'hz':targetis the desired output sampling rate in Hz; bin size in samples =round(sampling_freq / target).
Output row count is ceil(n_input / bin_size); the final bin can
contain fewer than bin_size rows (matches nltools).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
pandas.DataFrame (or 2-D array-like) where rows are time samples. |
required | |
sampling_freq
|
original sampling frequency in Hz. |
required | |
target
|
see |
required | |
target_type
|
|
'samples'
|
|
method
|
|
'mean'
|
Source code in feat/utils/stats.py
regress(X, y, mode='ols', **kwargs)
¶
Ordinary least squares multiple regression.
Drop-in replacement for nltools.stats.regress for the ols path; other modes (e.g., robust, ridge) are not implemented.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
[n, p] design matrix (numpy array or array-like). |
required | |
y
|
[n] or [n, k] response. |
required | |
mode
|
only |
'ols'
|
Returns:
| Type | Description |
|---|---|
|
(beta, se, t_stats, p_vals, df, residuals). |
|
|
beta/se/t_stats/p_vals shape: [p, k] (or [p] if y is 1-D). |
|
|
df is a scalar (n - p). residuals shape: [n, k] (or [n]). |
Source code in feat/utils/stats.py
set_decomposition_algorithm(algorithm='pca', n_components=None, *args, **kwargs)
¶
Return an unfit sklearn decomposition object by name.
Drop-in replacement for nltools.utils.set_decomposition_algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algorithm
|
one of 'pca', 'ica', 'nnmf', 'fa'. |
'pca'
|
|
n_components
|
passed through to the sklearn class. |
None
|
Source code in feat/utils/stats.py
softmax(x)
¶
Softmax function to change log likelihood evidence values to probabilities. Use with Evidence values from FACET.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
value to softmax |
required |
upsample(data, sampling_freq, target, target_type='hz', **kwargs)
¶
Upsample a DataFrame's rows by Fourier-domain resampling.
Drop-in replacement for nltools.stats.upsample.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
pandas.DataFrame (or 2-D array-like). |
required | |
sampling_freq
|
original sampling frequency in Hz. |
required | |
target
|
target frequency or duration; interpretation set by target_type. |
required | |
target_type
|
'hz' (target is target sampling rate in Hz), 'samples' (target is the desired sample count), 'seconds' (target is the period of the upsampled signal in seconds). |
'hz'
|
Returns:
| Type | Description |
|---|---|
|
Same type as input, with the new row count. |
Source code in feat/utils/stats.py
wavelet(freq, num_cyc=3, sampling_freq=30.0)
¶
Create a complex Morlet wavelet.
Creates a complex Morlet wavelet by windowing a cosine function by a Gaussian. All formulae taken from Cohen, 2014 Chaps 12 + 13
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
freq
|
(float) desired frequency of wavelet |
required | |
num_cyc
|
(float) number of wavelet cycles/gaussian taper. Note that smaller cycles give greater temporal precision and that larger values give greater frequency precision; (default: 3) |
3
|
|
sampling_freq
|
(float) sampling frequency of original signal. |
30.0
|
Returns:
| Name | Type | Description |
|---|---|---|
wav |
(ndarray) complex wavelet |