Re: [Freesurfer] Recent paper on FreeSurfer reliability

Gabriele Fariello Wed, 20 Jun 2012 09:44:18 -0700

Hi Bruce et al,

I may be late to the discussion, but wanted to share some insights given
that we've had some headaches trying to get identical results
on presumptively identical systems for FreeSurfer and other tools, I wanted
to add my two cents. Ultimately, of course, the systems were not 100%
identical, and the differences resulted in some down-stream libraries using
32-bit code in stead of 64-bit. Some differences had been observed in other
software packages to a lesser extent, but I'm confident that had we
compared any other software which relied on complex iterative algorithms
using these libraries, the differences would have been more pronounced.
Re-imaging the systems to ensure that all libraries were identical resulted
in identical output. Then when migrating from RHEL 4.x to 5.x to 6.x, we
did some similar library checking and again were able to produce identical
results.


It is important for the neuroimaging community to understand
that reproducibility is in large part their responsibility and not that of
the software package or operating system developers. It is effectively
impossible to guarantee identical results on non-identical systems.
Unfortunately, "identical" can for some analysis leave little room for
differences (perhaps even the random seed differences as just suggested by
Michael Harms), but notwithstanding hardware related interference, bugs, or
errors, identical systems should generally produce identical results.

I forwarded the paper that sparked this thread in jest to friends saying
"why can't things be simple?" but in reality, as Bruce mentioned, this is
not in the least a surprise. I've been involved in testing output and
fixing it between software and system updates in clinical and research
settings since 1999, and I'm pretty sure I was not the first. There is a
reason the FDA does not want you upgrading even Mine Sweep on a validated
Windows system without re-validating. Researchers need to think similarly
(if not quite as extremely).

Some additional notes for those of you who may not be aware:

   1. The USER environment can affect results. On GNU/Linux systems, for
   example, modifying the $PATH or $LD_LIBRARY_PATH variables may result in
   different output from the same executable on the same system by different
   users (or the same user under different shells). Mac and Windows can have
   similar issues, particularly when concerning "power users".
   2. Statically compiling software does not eliminate the use of
   dynamically loaded libraries (see a good explanation at "Linking
   libstdc++ 
statically<http://www.trilithium.com/johan/2005/06/static-libstdc/>").
   So even statically compiled software can be affected by other libraries on
   the system.
   3. All other things being identical, using Intel vs. AMD x86 compatible
   chips should not affect the output; however, going to ARM, RISC, or GPU
   chips where floating point representations are IEEE compliant but different
   would virtually guarantee different results even if all libraries are the
   same for all but the simplest calculations. This means that unfortunately
   you'll likely never be able to reproduce your 1,000 node cluster results on
   your iPad -- no matter how cool or powerful it gets.

All that being said, if you do run your entire experiment twice, using two
different systems that differ only in their IEEE-compliant double precision
floating point implementation and the results are significantly different
(e.g., running on a XEON cluster the hippocampal volumes of group A and
group B are different and running on an NVIDIA GPU cluster they are not),
that would bring into question the validity/reliability of the analysis. I
have not seen any evidence of that.

That may have been more than two cents.

-Gabriele

P.S. I'm *not* going to chime in on differences between versions, since I
can't imagine how a segmentation algorithm (for example) would have gotten
more accurate and yet have produced the same results.

-- 
Head of Neuroinformatics -  Harvard University, Faculty of Arts & Sciences
Director of Clinical Research Informatics - Massachusetts General Hospital
NorthWest Laboratories,52 Oxford St, Suite 295.09,     Cambridge, MA 02138
phone:+1.617.299.6566,  gabriele_farie...@harvard.edu  fax:+1.617.812.0387
http://neuroinformatics.harvard.edu             http://nmr.mgh.harvard.edu

_______________________________________________
Freesurfer mailing list
Freesurfer@nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Re: [Freesurfer] Recent paper on FreeSurfer reliability

Reply via email to