O R I G I N A L R E S E A R C H A R T I C L E

Electroencephalography robust statistical

linear modelling using a single weight per trial

Cyril Pernet1,2, Ignacio Suay Mas2, Guillaume Rousselet3, Ramon Martinez4, Rand Wilcox5 & Arnaud Delorme4,6,7

1 Neurobiology Research Unit, Copenhagen University Hospital, Rigshospitalet, Denmark, 2 Centre for Clinical Brain Sciences, University of Edinburgh, United Kingdom,

3 Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom, 4 Swartz Center for Computational Neurosciences,

University of California, San Diego, 5 Department of Psychology, University of Southern California, United States of America, 6 Centre de Recherche Cerveau et Cognition,

Université Toulouse III – Paul Sabatier, Toulouse, France, 7 Centre National de la Recherche Scientiﬁ que, Centre de Recherche Cerveau et Cognition, Toulouse, France

Pernet etal. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 IGO License, which permits the copy

and redistribution of the material in any medium or format provided the original work and author are properly credited. In any reproduction of this article there should not be any

suggestion that APERTURE NEURO or this article endorse any speci c organization or products. The use of the APERTURE NEURO logo is not permitted. This notice should be

preserved along with the article’s original URL. Open access logo and text by PLoS, under the Creative Commons Attribution-Share Alike 4.0 Unported license.

: 2022, Volume 2 - 1 - CC-BY: © Pernet etal.

: 2022, Volume 2 - 1 -

ABSTRACT

Being able to remove or weigh down the in uence of outlier data is desirable for any statistical model. While magnetic and electro-

encephalographic (MEEG) data are often averaged across trials per condition, it is becoming common practice to use information

from all trials to build statistical linear models. Individual trials can, however, have considerable weight and thus bias inferential

results (effect sizes as well as thresholded t/F/p maps). Here, rather than looking for univariate outliers, de ned independently at

each measurement point, we apply the principal component projection (PCP) method at each channel, deriving a single weight

per trial at each channel independently. Using both synthetic data and open electroencephalographic (EEG) data, we show (1) that

PCP is ef cient at detecting a large variety of outlying trials; (2) how PCP-based weights can be implemented in the context of the

general linear model (GLM) with accurate control of type 1 family-wise error rate; and (3) that our PCP-based weighted least squares

(WLS) approach increases the statistical power of group analyses as well as a much slower iterative reweighted least squares (IRLS)

approach, although the weighting scheme is markedly different. Together, our results show that WLS based on PCP weights derived

from whole trial pro les is an ef cient method to weigh down the in uence of outlier EEG data in linear models.

Keywords: Electroencephalography, single trials, statistical outliers, weighted least squares, general linear model

Data availability: All data used are publicly available (CC0); all codes (simulations and data analyses) are also available online in the LIMO MEEG GitHub repository (MIT license).

Date Received: May 03, 2021

Date Accepted: December 30, 2021

DOI: 10.52294/2e69f7cc-f061-40ad-ad77-017110464dfd

channel × time × trials and channel × frequency × time ×

trials. Several neuroimaging packages are dedicated to

the statistical analyses of such large multidimensional data,

often using linear methods. For instance, in the LIMO MEEG

toolbox (2), each channel, frequency and time frame are an-

alysed independently using the general linear model, an

approach referred to as mass-univariate analysis. Ordinary

least squares (OLS) approaches are used to nd model pa-

rameters that minimize the error between the model and

the data. For least squares estimates to have good statistical

properties, it is however expected that the error covariance

off-diagonals are zeros, such that Cov(e) =

σ

2l, l being the

identity matrix (3), assuming observations are independent

and identically distributed. It is well established that devia-

tions from that assumption lead to substantial power reduc-

tion and to an increase in the false-positive rate. When OLS

assumptions are violated, robust techniques offer reliable

solutions to restore power and control the false-positive

rate. Weighted least squares (WLS) approach is one such

INTRODUCTION

Magnetic and electroencephalographic (MEEG) data

analysis can be subdivided into two main parts: prepro-

cessing and processing. Data preprocessing corresponds

to any manipulation and transformation of the data, such

as artefacts removal and attenuation or change in data

representations (spectral transformation, source model-

ling). Data processing refers to mathematical procedures

that do not change the data, i.e. statistical analysis and

statistical modelling (1). Here, we present a method that

improves data processing by weighting down trials that

have different dynamics from the bulk of the data, within

the context of an experimental design. We investigated

single-trial weighting for full scalp/channel statistical lin-

ear hierarchical modelling, providing a more robust sta-

tistical analysis method to the community.

After data preprocessing, MEEG data are typically ep-

oched to form three- or four-dimensional matrices of e.g.

: 2022, Volume 2 - 2 - CC-BY: © Pernet et al.

O R I G I N A L R E S E A R C H A R T I C L E

: 2022, Volume 2 - 2 -

robust method that uses different weights across trials, such

that Cov(e) =

σ

2

V, with V a diagonal matrix:

y = X

β

+

e, E

(

e

)

= 0,

Cov(

e

) =

σV

(1)

with y an n-dimensional vector (number of trials), X the

n*p design matrix,

β

a p-dimensional vector (number of

predictors in X) and e the error vector of dimension n.

The WLS estimators can then be obtained using an OLS

on transformed data (equations 2 and 3):

Wy = WX

β

+ We, E(e) = 0, Cov(e) =

σ

I (2)

β

= (XTWX)−1XTWy (3)

with W a 1*n vector of weights.

When applied to MEEG data, a standard mass-univariate

WLS entails obtaining a weight for each trial but also

each dimension analysed, i.e. channels, frequencies and

time frames. Following such procedure, a trial could be

considered as an outlier or be assigned a low weight,

for a single frequency or time frame, which is implausi-

ble given the well-known correlations of MEEG data over

space, frequencies and time. We propose here that a

single or a few consecutive data points should never be

agged as outliers or weighted down, and that a single

weight per trial (and channel) should be derived instead,

with weights taking into account the whole temporal or

spectral pro le. In the following, we demonstrate how

the principal component projection (PCP) method (4) can

be used in this context, and how those weights can then

be used in the context of the general linear model (WLS),

applied here to event-related potentials. Such weight-

ing should not be taken to bypass data preprocessing.

Because measurement artefacts and physiological arte-

facts are typically several orders of magnitude stronger

than the signal of interest, and often span consecutive

trials, they require dedicated algorithms. After data

preprocessing, residual signal artefacts might however

remain. In addition, natural and unintended experimen-

tal variations might also exist among all trials, and the

weighting scheme aims to account for these.

METHOD

Trial-based weighted least squares

The new WLS solution consists of computing rst-level

general linear model (GLM) beta estimates using weights

from the PCP algorithm (4). As such, the estimation proce-

dure follows the usual steps of weighting schemes, like a

standard iterative reweighted least squares (IRLS) solution:

1. After the OLS solution is computed, an adjust-

ment is performed on residuals by multiplying them

by −

h

1/ 1where h is a vector of leverage points

(i.e. the diagonal of the hat matrix H = X(X, X)−1 X where

X is the design matrix re ecting experimental condi-

tions). This adjustment is necessary because leverage

points are the most in uential on the regression space,

i.e. they tend to have low residual values (5).

2. Residuals are then standardized using a robust esti-

mator of dispersion, the median absolute deviation

(MAD) to the median, and re-adjusted by the tun-

ing function. Here, we used the bisquare function. A

series of weights is next obtained either based on

distances (PCP method) or minimizing the error term

(IRLS) with, in both cases, high weights for data points

having high residuals (with a correction for leverage).

3. The solution is then computed following equation 3.

Unlike IRLS, WLS weights are not derived for each

time frames independently but by looking at the multi-

variate distances in the principal component space (step

2 above) and thus deriving a unique weight for all time

frames. An illustration of the method is shown in gure 1.

Trial weights are computed as a distance among trials pro-

jected onto the main (≥99%) principal components space.

Here, the principal components computed over the f time

frames are those directions that maximize the variance

across trials for uncorrelated (orthogonal) time periods ( g-

ure 1B). Outlier trials are points in the f-dimensional space

that are far away from the bulk. By virtue of the principal

component analysis (PCA), these outlier trials become

more visible along the principal component axes than in

the original data space. Weights ( gure 1E) for each trial

are obtained using both the Euclidean norm ( gure 1C, dis-

tance location) and the kurtosis weighted Euclidean norm

( gure 1D, distance scatter) in this reduced PCA space (see

Ref. (4) for details). Weights are then applied to each trial

(at each data points) on the original data (i.e. there is no

data reduction). We exploit this simple technique because

it is computationally fast given the rich dimensional space

of EEG data and because it does not assume the data to

originate from a particular distribution. The PCP algorithm

is implemented in the limo_pcout.m function and the WLS

solution in the limo_WLS.m function, both distributed

with the LIMO MEEG toolbox (https://limo-eeg-toolbox.

github.io/limo_meeg/).

Simulation-based analyses

A. Outlier detection and effectiveness of the weighting

scheme

Simulated event-related potentials (ERPs) were generated

to evaluate the classi cation accuracy of the PCP meth-

od and to estimate the robustness-to-outliers and signal-

to-noise ratio (SNR) of the WLS solution in comparison

to an OLS solution and a standard IRLS solution, which

minimizes residuals at each time frame separately (imple-

mented in limo_IRLS.m). To do so, we manipulated the

following: (1) the percentage of outliers, using 10%, 20%,

30%, 40% or 50% of outliers; (2) the SNR (de ned relative

to the mean over time of the background activity); and (3)

the type of outliers. The rst set of outliers were de ned

: 2022, Volume 2 - 3 - CC-BY: © Pernet et al.

O R I G I N A L R E S E A R C H A R T I C L E

: 2022, Volume 2 - 3 -

and the truth while KS distances provide a fuller picture

of the overall differences in distributions.

The code used to generate the ERP and the results

are available at https://github.com/LIMO-EEG-Toolbox/

limo_test_stats/tree/master/PCP_simulations/.

MCC

TP TN FP FN

TP FP TP FN TN FP TN FN

()()()

()

=

∗−∗

++

++

(4)

with TP the true positives, TN the true negatives, FP the

false positives and FN the false negatives.

B. Statistical inference

Accurate estimation of model parameters (i.e. beta esti-

mates in the GLM – equation 3) is particularly important

because it impacts group-level results. Inference at the

single-subject level may, however, also be performed

and accurate p-values need to be derived. Here, error

degrees of freedom are obtained using the Satterthwaite

approximation ((8) equation 5).

dfe = tr ([I − H]T[I − H]) (5)

with dfe, the degrees of freedom of the error, the identity

matrix, and the hat matrix.

based on the added noise: white noise, pink noise, alpha

oscillations and gamma oscillations following (6). In these

cases, the noise started with a P1 component and lasted

~200 ms (see below). The second set of outliers were de-

ned based on their amplitude or outlier-to-signal ratio

(0.5, 0.8, 1.2 and 1.5 times the true N1 amplitude).

Synthetic data were generated for one channel, using

the model developed by Yeung et al. (2018 — (7)). The

simulated signal corresponded to an event-related po-

tential with P1 and N1 components (100 ms long) added

to background activity with the same power spectrum

as human EEG, generating 200 trials of 500 ms dura-

tion with a 250-Hz sampling rate. Examples for each

type of simulation are shown in gure 2 and results are

based, for each case, on a thousand random repeti-

tions. Performance of the PCP algorithm at detecting

outlying synthetic EEG trials was investigated by com-

puting the confusion matrix and mapping the true- and

false-positive rates in the receiver operating space,

and by computing the Matthew correlation coef cients

(MCCs – equation 4). The robustness, or effectiveness

of the weighting scheme, was examined by computing

the Pearson correlations and the Kolmogorov–Smirnov

(KS) distances between the ground truth mean and the

OLS, WLS and IRLS means. Pearson values allowed to es-

timate the linear relationships between estimated means

Fig. 1. Illustration of the PCP weighting scheme using trials for ‘famous faces’ of the OpenNeuro.org publicly available ds002718 dataset. Data are from subject 3, chan-

nel 34 (see section on empirical data analysis). Panel A shows the single-trial responses to all stimuli. The principal component analysis is computed over time, keeping

the components explaining the most variance and summing to at least 99% of explained variance (giving here 69 eigenvectors, i.e. independent time components from

the initial 176 time points). The data are then projected onto those axes (panel B). From the data projected onto the components, Euclidean distances for location and

scatter are computed (panels C and D – showing smooth histograms of weights) and combined to obtain a distance for each trial. That distance is either used as weights

in a linear model or used to determine outliers (panel E, with outliers identi ed for weights below ~0.27, shown in dark grey). At the bottom right, the mean ERP for

trials classi ed as good (red) vs. outliers (black) and the weighted mean (green) are shown (panels F and G). Shaded areas indicate the 95% highest-density percentile

bootstrap intervals.