
: 2022, Volume 2 - 2 - CC-BY: © Jones et al.
O R I G I N A L R E S E A R C H A R T I C L E
include the six realignment parameters (translation and
rotation around the X, Y, and Z axes, reecting estimated
participant motion) as nuisance regressors in rst-level
models.
Beyond motion parameter inclusion, several data-
driven strategies have been developed to reduce the
inuence of high-motion scans on estimated activa-
tions. Wavelet decomposition identies artifacts by ex-
ploiting their non-stationarity across different temporal
scales (Patel et al. 2014). The method has been applied
in resting-state studies but is also applicable to task-
based data. Independent component analysis (Pruim
et al. 2015) identies artifacts based on the spatial dis-
tribution of shared variance. In robust weighted least
squares (Diedrichsen and Shadmehr 2005), a two-pass
modeling procedure is used to produce a collection of
nuisance regressors that are then included in the nal
analysis to weight frames by the inverse of their variance
(that is, downweighting frames with high error).
An alternative motion correction strategy is “scrub-
bing” or “frame censoring” (Lemieux et al. 2007; Siegel
et al. 2014). In this approach, bad scans are identied
and excluded from statistical analysis. One approach is
to do so by modeling them in the general linear model
using nuisance regressors (i.e., “scan-nulling regressors”
or “one-hot encoding”). Although frame censoring has
received considerable interest in resting-state fMRI over
the past several years (Power et al. 2012; Gratton et al.
2020a), it has not seen widespread use in the task-based
fMRI literature. Censoring approaches involve some ef-
fective data loss, in that censored frames do not con-
tribute to the task-related parameter estimates, and that
columns introduced to the design matrix to perform cen-
soring reduce the available degrees of freedom. There
are different ways to quantify “bad” scans, and choos-
ing both an appropriate metric and associated thresh-
old can also be challenging. Thus, additional information
over what threshold should be used for identifying bad
frames—and relatedly, how much data are lost versus
retained—is necessary to make informed decisions.
Although several published studies compare differing
correction strategies (Ardekani et al. 2001; Oakes et al.
2005; Johnstone et al. 2006), a drawback of prior work is
that evaluation was often limited to a single dataset (see
Supplemental Table 1). The degree to which an optimal
strategy for one dataset generalizes to other acquisition
schemes, tasks, or populations is not clear. With the in-
creased public availability of neuroimaging datasets
(Poldrack et al. 2013; Markiewicz et al. 2021), the possibil-
ity of evaluating motion correction approaches across a
range of data has become more feasible.
In the present work, we sought to compare the per-
formance of identical pipelines on a diverse selection of
tasks, using data from different sites, scanners, and par-
ticipant populations. Although our primary interest was
frame censoring, we considered seven different motion
correction approaches:
1. six canonical head motion (i.e., “realignment pa-
rameter”) estimates (RP6)
2. 24-term expansions of head motion estimates
(RP24)
3. wavelet despiking (WDS)
4. robust weighted least squares (rWLS)
5. untrained independent component analysis (uICA)
6. frame censoring based on frame displacement
(FD)
7. frame censoring based on variance differentiation
(DVARS)
This list is not exhaustive but is representative of ap-
proaches that are currently used and feasible to include
in an automated processing pipeline.
Because it is impossible to determine a “ground truth”
result with which to compare the effectiveness of these
approaches, we instead considered four complementary
outcome metrics: (1) the maximum group t-statistic both
across the whole brain and in a region of interest (ROI)
relevant to the task; (2) the average parameter estimates
from within the same ROI; (3) the degree of test–retest
consistency exhibited by subject-level parametric maps;
and (4) the spatial overlap of thresholded group-level
statistical maps. These metrics are simple to dene yet
functionally meaningful and can be applied to data from
almost any fMRI study. In our view, Dice quanties repli-
cability, the mean ROI value quanties effect size (signal),
and maximum-t quanties signal to noise (effect size pe-
nalized by variance).
METHODS
Datasets
We analyzed eight studies obtained from OpenNeuro
(Markiewicz et al. 2021), several of which included mul-
tiple tasks or multiple participant groups. As such, the
eight selected studies provided a total of 15 datasets.
The selection process was informal, but studies given
priority included: (1) a clearly dened task; (2) a sufcient
number of subjects to allow second-level modeling;
(3) sufcient data to make test–retest evaluation possible;
and (4) a publication associated with the data describing
a result to which we could compare our own analysis.
A summary of the eight datasets selected is shown in
Table 1 (acquisition details are provided in Supplemental
Table 2). Additional information, including task de-
tails, modeling/contrast descriptions compiled from