R E S E A R C H A R T I C L E
BrainIAK: The Brain Imaging Analysis Kit
Manoj Kumar,a Michael J. Anderson,b James W. Antony,a Christopher Baldassano,c Paula P. Brooks,a Ming Bo Cai,d
Po-Hsuan Cameron Chen,e Cameron T. Ellis,f Gregory Henselman-Petrusek,a David Huberdeau,f J. Benjamin Hutchinson,g
Y. Peeta Li,g Qihong Lu,h Jeremy R. Manning,i Anne C. Mennen,a Samuel A. Nastase,a Hugo Richard,j Anna C. Schapiro,k
Nicolas W. Schuck,l,m Michael Shvartsman,e Narayanan Sundaram,b Daniel Suo,n Javier S. Turek,o David Turner,a Vy A. Vo,o
Grant Wallace,a Yida Wang,b Jamal A. Williams,a,h Hejia Zhang,e Xia Zhu,o Mihai Capota˘ ,o Jonathan D. Cohen,a,h Uri Hasson,a,h
Kai Li,n Peter J. Ramadge,p Nicholas B. Turk-Browne,f Theodore L. Willke,o and Kenneth A. Normana,h,*
a Princeton Neuroscience Institute, Princeton University, Princeton, NJ
b Work done while at Parallel Computing Lab, Intel Corporation, Santa Clara, CA
c Department of Psychology, Columbia University, NY, NY
d International Research Center for Neurointelligence (WPI-IRCN), UTIAS, The University of Tokyo, Japan
e Work done while at Princeton Neuroscience Institute, Princeton University, Princeton, NJ
f Department of Psychology, Yale University, New Haven, CT
g Department of Psychology, University of Oregon, Eugene, OR
h Department of Psychology, Princeton University, Princeton, NJ
i Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH
j Parietal Team, Inria, Neurospin, CEA, Université Paris-Saclay, France
k Department of Psychology, University of Pennsylvania, Philadelphia, PA
l Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany
m Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany
n Department of Computer Science, Princeton University, Princeton, NJ
o Brain-Inspired Computing Lab, Intel Corporation, Hillsboro, OR
p Department of Electrical Engineering, and the Center for Statistics and Machine Learning, Princeton University, Princeton, NJ
Kumar etal. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 IGO License, which permits the copy
and redistribution of the material in any medium or format provided the original work and author are properly credited. In any reproduction of this article there should not be any
suggestion that APERTURE NEURO or this article endorse any speci c organization or products. The use of the APERTURE NEURO logo is not permitted. This notice should be
preserved along with the article’s original URL. Open access logo and text by PLoS, under the Creative Commons Attribution-Share Alike 4.0 Unported license.
: 2021, Volume 1 - 1 - CC-BY: © Kumar et al.
: 2021, Volume 1 - 1 -
ABSTRACT
Functional magnetic resonance imaging (fMRI) offers a rich source of data for studying the neural basis of cognition. Here, we
describe the Brain Imaging Analysis Kit (BrainIAK), an open-source, free Python package that provides computationally optimized
solutions to key problems in advanced fMRI analysis. A variety of techniques are presently included in BrainIAK: intersubject
correlation (ISC) and intersubject functional connectivity (ISFC), functional alignment via the shared response model (SRM), full
correlation matrix analysis (FCMA), a Bayesian version of representational similarity analysis (BRSA), event segmentation using
hidden Markov models, topographic factor analysis (TFA), inverted encoding models (IEMs), an fMRI data simulator that uses
noise characteristics from real data (fmrisim), and some emerging methods. These techniques have been optimized to leverage
the ef ciencies of high-performance compute (HPC) clusters, and the same code can be seamlessly transferred from a laptop to
a cluster. For each of the aforementioned techniques, we describe the data analysis problem that the technique is meant to solve
and how it solves that problem; we also include an example Jupyter notebook for each technique and an annotated bibliography
of papers that have used and/or described that technique. In addition to the sections describing various analysis techniques in
BrainIAK, we have included sections describing the future applications of BrainIAK to real-time fMRI, tutorials that we have de-
veloped and shared online to facilitate learning the techniques in BrainIAK, computational innovations in BrainIAK, and how to
contribute to BrainIAK. We hope that this manuscript helps readers to understand how BrainIAK might be useful in their research.
Keywords: MVPA, fMRI analysis, high-performance computing, machine learning, fMRI simulator, tutorials
Correspondence: Kenneth A. Norman, Email: knorman@princeton.edu
Received: December 9, 2020
Accepted: September 22, 2021
DOI: 10.52294/31bb5b68-2184-411b-8c00-a1dacb61e1da
now available that implement these pattern analysis meth-
ods, including, for example the Princeton MVPA Toolbox
[5], the Decoding Toolbox [6], CoSMoMVPA [7], Nilearn
[8], and PyMVPA [9] (for a full list see https://github.com/
ohbm/hackathon2019/blob/master/TutorialResources.md).
Scientists can choose which toolbox to use based on the
analysis that they wish to perform and the programming
language they wish to use.
INTRODUCTION
Cognitive neuroscientists have come a long way in using
functional magnetic resonance imaging (fMRI) to help an-
swer questions about cognitive processing in the brain. A
variety of methods have been developed, ranging from
univariate techniques to multivariate pattern analysis
(MVPA) methods [1–4]. A large number of toolboxes are
: 2021, Volume 1 - 2 - CC-BY: © Kumar et al.
R E S E A R C H A R T I C L E
: 2021, Volume 1 - 2 -
In this work, we describe the Brain Imaging Analysis
Kit (BrainIAK (RRID:SCR 014824), https://brainiak.org),
an open-source Python package that implements com-
putationally optimized solutions to key problems in
advanced fMRI data analysis, focusing on analysis steps
that take place after data have been preprocessed and
put in matrix form. BrainIAK can be viewed as a “Swiss
army knife” for advanced fMRI analysis, where we are
constantly striving to add new tools. Presently, BrainIAK
includes methods for running intersubject correlation
(ISC) [10] and intersubject functional correlation (ISFC)
[11, 12], functional alignment via the shared response
model (SRM) [13], Bayesian Representational Similarity
Analysis (RSA) [14, 15], event segmentation [16], dimen-
sionality reduction via topographic factor analysis (TFA)
[17], and inverted encoding models (IEMs) [18, 19].
To avoid duplication across packages, BrainIAK lever-
ages available methods in other packages – it is well inte-
grated with Nilearn (https://nilearn.github.io/index.html)
[20] and extensively uses scikit-learn (https://scikit-learn
.org/) [21] for machine learning algorithms. The functions
in BrainIAK are optimized to run on high-performance
compute (HPC) clusters for ef cient execution on large
datasets. The same code can be executed on a laptop
or an HPC cluster, saving signi cant time in refactoring
the code to run in an HPC environment. BrainIAK also
includes a detailed set of tutorials [22] that are didactic
in nature; the tutorials include very detailed steps and
helper functions that facilitate learning and implement-
ing some of the methods, including materials relevant
to running on HPC clusters. Scientists can also use
BrainIAK’s simulator [23] to create model-based patterns
of activity at the voxel level, without going through the
expensive and time-consuming process of data collec-
tion. The package is released with an open-source li-
cense and is free to use on a variety of platforms. The
BrainIAK package welcomes contributions from the
community, and new methods are continuously added
to the package.
METHODS IN BRAINIAK
In the following sections, we present an overview of
each of the methods presently included in BrainIAK and
an accompanying example notebook. For each meth-
od, we list the data analysis problem that it is meant to
solve and how it solves that problem. The notebooks
also contain an annotated bibliography for each meth-
od, listing papers that have described and/or used this
method. These example notebooks are not as didac-
tic as the tutorials. Instead, the notebooks we provide
here are integrated with the BrainIAK documentation,
provide an overview of the technique, and allow users
to quickly access code snippets for the method. Also,
the notebooks include methods that are not covered
in the tutorials such as Bayesian RSA [14, 15], TFA [17],
IEMs [18, 19], BrainIAK’s simulator [23], and matrix-nor-
mal models [24]. All example notebooks are available
at https://github.com/brainiak/brainiak-aperture, along
with instructions on how to run them.
Intersubject Correlation
The Problem: Measuring the Brain’s Response
toNaturalistic Stimuli
One of the traditional goals of fMRI research is to mea-
sure the brain’s response to a particular stimulus, task, or
other experimental manipulation. Typically, this approach
relies on tightly controlled experimental designs – by
contrasting two stimuli or tasks, or parametrically varying
a particular experimental variable, we can isolate brain
responses to the variable of interest. Experimentally iso-
lating particular variables can reduce ecological validity;
in response to this, cognitive neuroscientists have begun
to adopt more naturalistic paradigms [25–30]. However,
using naturalistic stimuli comes with its own set of chal-
lenges – in particular, if the stimuli are too complex to be
modeled using a small set of regressors, the standard
approach of relating a design matrix to the fMRI signal
may not be practical.
The Solution
ISC analysis takes a different approach to this problem
instead of trying to fully describe the stimulus in a de-
sign matrix, ISC measures stimulus-evoked responses
to naturalistic stimuli by isolating brain activity shared
across subjects receiving the same stimulus [10, 12].
When experimental participants are presented with a
stimulus such as a movie or a spoken story, their brain
activity can be conceptually decomposed into at least
two components: (1) a stimulus-related component
that is synchronized across subjects due to the use of
a common stimulus; and (2) a subject-speci c com-
ponent capturing both idiosyncratic stimulus-related
signals (e.g., unique memory and interpretation) and
nonstimulus-related signals (e.g., physiological noise;
Figure 1A). ISC analysis measures the former (shared,
stimulus-related) component,  ltering out the latter
(idiosyncratic) component (Figure 1B).
This shared signal can be driven by different features
of the stimulus in different brain regions. For example,
when listening to a spoken story, ISC in early auditory
areas may be driven by acoustic features of the stimu-
lus, whereas ISC in the association cortex may be driv-
en by higher-level linguistic features of the stimulus. In
this sense, ISC is agnostic to the content of the stimulus
and serves as a measure of reliability of stimulus-evoked
responses across subjects (or as a “noise ceiling” for
model-based prediction across subjects [12, 31, 32]).
This is particularly useful for complex, naturalistic stim-
uli where exhaustively modeling stimulus features may
be dif cult. This also allows us to leverage naturalistic
stimuli to ask novel questions about brain organization.
: 2021, Volume 1 - 3 - CC-BY: © Kumar et al.
R E S E A R C H A R T I C L E
: 2021, Volume 1 - 3 -
enables us to discover whether certain spatial response
patterns are consistent or reemerge over time [16].
The Notebook
The accompanying notebook applies ISC analysis
to an example fMRI story-listening dataset from the
“Narratives” data collection [39, 40]. To reduce com-
putational demands, we compute ISC on a time series
averaged within each parcel extracted from a function-
al cortical parcellation [41]. We  rst demonstrate high
ISC values extending from low-level auditory cortex
to higher-level cortical areas during story listening.
However, when listening to a temporally scrambled
version of the stimulus, ISC is dramatically reduced in
higher-level cortex areas, suggesting that these areas
encode temporally evolving features of the stimulus
(e.g., narrative context). We next perform a similar
comparison between intact and scrambled story stim-
uli using traditional within-subject FC and ISFC anal-
ysis. The networks estimated using within-subject FC
are similar across the two types of stimuli, while ISFC
analysis yields very different networks for the intact and
scrambled stories. BrainIAK also offers several non-
parametric statistical tests for ISC and ISFC analysis,
some of which are discussed in the notebook.
Compute Recommendations
The computational demands of ISC/ISFC analyses scale
with the number of subjects, voxels, and timepoints (TRs);
however, the memory demands of pairwise ISC analysis
For example, high ISCs extend from early auditory areas
to high-level association cortices during story-listening.
However, if we temporally scramble elements of the
story stimulus, this disrupts the narrative content of the
story; in this case, we still observe high ISC in early audi-
tory areas, but less so in higher-level cortices, suggesting
that certain association areas encode temporally evolv-
ing narrative content [33, 34].
Several variations on ISC have been developed at
both the implementational and conceptual levels. For
example, ISCs may be computed in either a pairwise or
leave-one-out fashion, both of which have associated
statistical tests [12, 35, 36]. An important conceptual ad-
vance has been to compute ISC across brain areas using
ISFC analysis [11, Figure 1D]. ISFC analysis allows us to
estimate functional connectivity (FC) networks analo-
gous to traditional within-subject FC analysis (Figure 1C).
However, unlike traditional within-subject FC analysis,
ISFC analysis isolates stimulus-driven connectivity and
is robust to idiosyncratic noise due to head motion and
physiological  uctuations [37]. Both ISC and ISFC can
be computed using a sliding window to measure coarse
uctuations in the shared signal over time. Finally, rather
than computing ISC on response time series, we can also
apply the logic of ISC to multivoxel pattern analysis [1].
Intersubject pattern correlation analysis captures spatial-
ly distributed shared response patterns across subjects
at each time point (e.g., [38]). Computing spatial ISC
between all time points (the spatial analogue of ISFC)
Fig. 1. Schematic of ISC and ISFC analysis. A. The measured response time series (maroon) can be decomposed into three components: a consistent stimulus-induced
component that is shared across subjects (red), an idiosyncratic stimulus-induced component (gold), and an idiosyncratic noise component (gray). B. ISC is computed
between two homologous brain areas (maroon and orange) across subjects, thus isolating the shared, stimulus-induced signal from idiosyncratic signals. C.Typical func-
tional connectivity analysis is computed within subjects across brain areas. D. ISFC is computed across both subjects and brain areas. ISFC analysis provides functional
network estimation analogous to within-subject functional connectivity analysis, but isolates the shared, stimulus-induced signal and is robust to idiosyncratic noise.
E.The diagonal of the ISFC matrix comprises the ISC values.
: 2021, Volume 1 - 4 - CC-BY: © Kumar et al.
R E S E A R C H A R T I C L E
: 2021, Volume 1 - 4 -
will increase more precipitously with the number of sub-
jects. A small-scale (e.g., parcellation-based) ISC analysis
with 30 subjects, 1,000 parcels, and a 300-TR duration
runs in a couple of seconds on a typical personal com-
puter. On the other hand, whole-brain voxelwise ISC
analysis with 50,000 voxels may require 10 or more min-
utes to run and require several GB of memory. For large-
scale ISC analyses, we recommend running the analy-
sis on a distributed computing cluster. Basic ISC/ISFC
analysis (as implemented in BrainIAK) requires a single
process to operate on data across all subjects. However,
some additional preprocessing can allow for paralleliza-
tion across subjects. For example, in the leave-one-out
approach, precomputing the average time series ex-
cluding each subject can allow the ISC computation to
proceed in parallel; in the pairwise approach, ISC for
each pair of subjects can be computed in parallel and
then recombined. Note that ISC analysis proceeds inde-
pendently for each brain variable (e.g., voxel or parcel),
so ISC analysis can also be parallelized across voxels; for
example, a whole-brain voxelwise ISC analysis with 50,000
voxels can be divided into 50 parallel jobs each running
ISC analysis on a subset of 1,000 voxels.
ISFC analysis computes the correlation between all
pairs of parcels or networks, and therefore, computation-
al demand increases primarily with the number of voxels.
Similar to ISC analysis, smaller-scale analyses (e.g., 30
subjects, 1,000 parcels, and 300 TRs) are easily computed
on a personal computer, whereas whole-brain voxelwise
analyses may require a computing cluster.
Shared Response Model
The Problem: Aligning Brain Data across Participants
One of the main obstacles in leveraging brain activi-
ty across subjects is the considerable heterogeneity of
functional topographies from individual to individual.
Variability in functional–anatomical correspondence
across individuals means that even high-performing an-
atomical alignment does not ensure  ne-grained func-
tional alignment (e.g., [42]). As an example, multivoxel
pattern analysis models that perform well within subjects
often degrade in performance when evaluated across
subjects (e.g., [43, 44]).
The Solution
SRM [13], alongside other methods of hyperalign-
ment [45–47], aims to resolve this alignment problem
by aligning on the basis of functional data. SRM esti-
mation is driven by the commonality in functional re-
sponses induced by a shared stimulus (e.g., watching
a movie). Unlike ISC analysis, which presupposes (often
very coarse) functional correspondence, SRM isolates
the shared response while accommodating misalign-
ment across subjects. SRM decomposes multisubject
fMRI data into a lower-dimensional shared space and
subject-speci c transformation matrices for projecting
from each subject’s idiosyncratic voxel space into the
shared space (Figure 2). Each of these topographic
transformations effectively rotates and reduces each
subject’s voxel space to  nd a subspace of shared fea-
tures where the multivariate trajectory of responses to
the stimulus is best aligned. These shared features do
not correspond to individual voxels; rather, they are
distributed across the full voxel space of each subject;
each shared feature can be understood as a weighted
sum of many voxels.
Transformations estimated from one subset of data
can be used to project unseen data into the shared
space. Projecting data into shared space increases
both temporal and spatial ISC (by design), and in many
cases improves between-subject model performance
to the level of within-subject performance. Between-
subject models with SRM can, in some cases, exceed
the performance of within-subject models because
(a) the reduced-dimension shared space can highlight
stimulus-related variance by  ltering out noisy or non-
stimulus-related features, and (b) the between-subject
model can effectively leverage a larger volume of data
after functional alignment than is available for any sin-
gle subject (e.g., [13, 48]). Denoised individual-subject
data can be reconstructed by projecting data from the
reduced-dimension shared space back into any given
subject’s brain space. Furthermore, in cases where each
subject’s unique response is of more interest than the
shared signal, SRM can be used to factor out the shared
component, thereby isolating the idiosyncratic response
for each subject [13].
Building on the initial probabilistic SRM formulation
[13, 49], several variants of SRM have been developed
to address related challenges. For example, a fast SRM
implementation has been introduced for rapidly analyz-
ing large datasets with reduced memory demands [50].
The robust SRM algorithm tolerates subject-speci c
outlying response elements [51], and the semisuper-
vised SRM capitalizes on categorical stimulus labels
when available [52]. Finally, estimating the SRM from
FC data rather than response time series circumvents
the need for a single-shared stimulus across subjects;
connectivity SRM allows us to derive a single-shared
response space across different stimuli with a shared
connectivity pro le [48].
The Notebook
The accompanying notebook applies the SRM to an ex-
ample fMRI story-listening dataset from the “Narratives”
data collection [39]. We apply the SRM within a temporal–
parietal region of interest (ROI) comprising the auditory
association cortex from a functional cortical parcella-
tion [41] and explore the components of the resulting
model.Weevaluate the SRM using between-subject time-
segment classi cation. This analysis reveals that the SRM
yields a considerable improvement in between-subject
classi cation beyond anatomical alignment.