
: 2021, Volume 1 - 5 - CC-BY: © Kruper et al.
R E S E A R C H A R T I C L E
discrepancies between ILF R recognized with waypoint
ROIs and with RecoBundles. Despite this bundle, we nd
high robustness overall. For MD, the rst quartile subject
robustness is 0.82 (Fig. 5C, D).
Tractometry results are robust to differences
in software implementation
Overall, we found that robustness of tractometry across
these different software implementations is high in most
white matter bundles. In the mAFQ/pyAFQ comparison,
most bundles have a wDSC around or above 0.8, except
the two callosal bundles (FA bundle and forceps pos-
terior (FP)), which have a much lower overlap (Fig. 6A).
Consistent with this pattern, pro le and subject robust-
ness are also overall rather high (Fig. 6B, C). The median
values across bundles are 0.71 and 0.77 for FA pro le and
subject robustness, respectively.
For some bundles, like the right and left uncinate (UNC
R and UNC L), there is large agreement between pyAFQ
and mAFQ (for subject FA: UNC L ρ = 0.90 ± 0.07, UNC
R ρ = 0.89 ± 0.08). However, the callosal bundles have
particularly low MD pro le robustness (0.07 ± 0.09 for FP,
0.18 ± 0.09 for FA) (Fig. 6B).
The robustness of tractometry to the differences be-
tween the pyAFQ and mAFQ implementation depends
on the bundle, scalar, and reliability metric. In addition, for
many bundles, the ACIP between mAFQ and pyAFQ re-
sults is very close to 0, indicating no systematic differenc-
es (Fig.6D). In some bundles – the CST and the anterior
thalamic radiations (ATR) – there are small systematic dif-
ferences between mAFQ and pyAFQ. In the forceps pos-
terior (FP), pyAFQ consistently nds smaller FA values than
mAFQ in a section on the left side. Notice that the forceps
anterior has an ACIP that deviates only slightly from0, even
though the forceps recognitions did not have as much
overlap as other bundle recognitions (see Fig. 6A).
DISCUSSION
Previous work has called into question the reliability of
neuroimaging analysis (e.g., (25, 45, 46)). We assessed
the reliability of a speci c approach, tractometry, which
is grounded in decades of anatomical knowledge, and
we demonstrated that this approach is reproducible,
reliable, and robust. A tractometry analysis typically
combines the outputs of tractography with diffusion
reconstruction at the level of the individual voxels
within each bundle. One of the major challenges fac-
ing researchers who use tractometry is that there are
many ways to analyze diffusion data, including differ-
ent models of diffusion at the level of individual voxels;
techniques to connect voxels through tractography;
and approaches to classify tractography results into
major white matter bundles. Here, we analyzed the re-
liability of tractometry analysis at several different lev-
els. We analyzed both TRR of tractometry results and
relative to pyAFQ in the UW-PREK dataset, when TRR
is relatively low for pyAFQ (see the FA bundle, CST L,
and ATR L in Fig. 3C). On the other hand, in the HCP-TR
dataset pyAFQ, we used the Reproducible Tract Pro le
(RTP) pipeline (42, 43), which is an extension of mAFQ,
and found that pyAFQ tends to have slightly higher pro-
le TRR than RTP for MD but slightly lower pro le TRR for
FA (Fig. 3D). The pyAFQ and RTP subject TRR are highly
comparable (Fig. 3E). In FA, the median pyAFQ subject
TRR for FA is 0.76, while the median RTP subject TRR is
0.74. Comparing different ODF models in pyAFQ, we
found that the DKI and CSD ODF models have highly
similar TRR, both at the level of wDSC (Fig. 3A) and at the
level of pro le and subject TRRs (Fig. 3F, G).
Robustness: comparison between
distincttractography models and
bundlesrecognition algorithms
To assess the robustness of tractometry results to differ-
ent models and algorithms, we used the same measures
that were used to calculate TRR.
Tractometry results can be robust to differences in
ODFmodels used in tractography
We compared two algorithms: tractography using DKI-
and CSD-derived ODFs. The weighted Dice similarity co-
ef cient (wDSC) for this comparison can be rather high in
some cases (e.g., the uncinate and corticospinal tracts,
Fig. 4A) but produce results that appear very different for
some bundles, such as the arcuate and superior longitu-
dinal fasciculi (ARC and SLF) (see also Fig. 4D). Despite
these discrepancies, pro le and subject robustness are
high for most bundles (median FA of 0.77 and 0.75, re-
spectively) (Fig. 4B, C). In contrast to the results found in
TRR, MD subject robustness is consistently higher than
FA subject robustness. The two bundles with the most
marked differences between the two ODF models are
the SLF and ARC (Fig. 4D). These bundles have low wDSC
and pro le robustness, yet their subject robustness re-
mains remarkably high (in FA, 0.75 ± 0.17 for ARC R and
0.88 ± 0.09 for SLF R) (Fig. 4C). These differences are par-
tially explained due to the fact that there are systematic
biases in the sampling of white matter by bundles gen-
erated with these two ODF models, as demonstrated by
the non-zero ACIP between the two models (Fig. 4E).
Most white matter bundles are highly robust
across bundle recognition methods
We compared bundle recognition with the same tractog-
raphy results using two different approaches: the default
waypoint ROI approach (9) and an alternative approach
(RecoBundles) that uses atlas templates in the space of
the streamlines (44). Between these algorithms, wDSC
is around or above 0.6 for all but one bundle, Right
Inferior Longitudinal Fasciculus (ILF R) (Fig. 5). There is
an asymmetry in the ILF atlas bundle (7), which results in