Anonymize your fMRI data for sharing¶
Before sharing an fMRI dataset (e.g., via RDR), you must remove all personally identifiable information (PII). This page gives you a single, clear pipeline to follow.
When to anonymize¶
Anonymize right after BIDS conversion, before preprocessing (fMRIPrep, SPM, etc.).
Deface raw data, not preprocessed data
Defacing tools (pydeface, mri_deface, afni_refacer_run) are designed to work on raw T1w images in native scanner coordinates. They use face-detection algorithms calibrated for standard anatomical orientations. Running them on already-preprocessed data (fMRIPrep outputs, FreeSurfer surfaces, etc.) may fail or produce incorrect results because the coordinate system and voxel intensities have been altered.
Always deface the raw BIDS T1w images first, then run your preprocessing pipeline on the defaced data.
This is safe because:
- fMRIPrep was developed and tested "almost exclusively using defaced data" (Chris Gorgolewski, fMRIPrep developer)
- A multisite comparison (Theyers et al., 2021) confirmed preprocessing results are effectively identical with defaced inputs
- The BIDS FAQ recommends this approach
- Preprocessing tools do not use any of the metadata fields you will remove
Anonymizing early minimises the non-anonymized surface and prevents accidental sharing of identifiable data.
Reproducibility note¶
While the studies above show that defacing has minimal impact on downstream results, it is not zero. Small differences in brain extraction, spatial normalization, and cortical surface reconstruction can occur because defacing alters voxels near the face/skull boundary. These differences are typically negligible for group-level analyses but should be documented:
- In your RDR README: mention that the shared raw data is defaced, and that results were originally computed from non-defaced data
- In your code repository: note that reproducing the full pipeline from the shared raw data may produce small floating-point differences compared to the published results
- In your manuscript (if applicable): state that data was defaced for sharing and reference the validation literature above
This is standard practice — most shared fMRI datasets are defaced, and the community considers the trade-off acceptable.
The pipeline¶
Step 1: Deface structural images and scrub metadata with BIDSonym¶
BIDSonym is a BIDS App that handles the full anonymization pipeline in one pass: defacing structural images, scrubbing PII from JSON sidecars, and cleaning NIfTI headers. It was explicitly designed to run right after BIDS conversion.
# Install (Docker required)
docker pull peerherholz/bidsonym
# Run on your BIDS dataset
docker run -it --rm \
-v /path/to/your/BIDS:/bids_dataset \
peerherholz/bidsonym /bids_dataset participant \
--deid pydeface \
--del_meta AcquisitionDateTime,AcquisitionTime,DeviceSerialNumber,StationName,InstitutionAddress,InstitutionalDepartmentName,ProcedureStepDescription \
--brainextraction bet \
--check_meta
This will:
- Deface all T1w images using pydeface (other options:
mri_deface,quickshear,mridefacer) - Remove the listed metadata fields from all JSON sidecars
- Check for remaining PII in JSON files (
--check_metaflag) - Back up originals to
sourcedata/bidsonym/before modifying
Without Docker
If Docker is not available, you can run the steps manually (see below). BIDSonym also supports Singularity/Apptainer.
Manual alternative (without BIDSonym)
1a. Deface with pydeface:
import subprocess, glob
for f in sorted(glob.glob('BIDS/sub-*/anat/*_T1w.nii.gz')):
subprocess.run(['pydeface', f, '--outfile', f, '--force'], check=True)
1b. Remove PII from JSON sidecars:
import json, glob
BIDS = '/path/to/your/BIDS'
# Fields to remove (identifiable but not needed for preprocessing)
pii_fields = [
'AcquisitionDateTime', 'AcquisitionTime',
'DeviceSerialNumber', 'StationName',
'InstitutionAddress', 'InstitutionalDepartmentName',
'ProcedureStepDescription', 'BidsGuess',
]
for f in sorted(glob.glob(f'{BIDS}/sub-*/anat/*.json') +
glob.glob(f'{BIDS}/sub-*/func/*.json')):
with open(f) as fh:
data = json.load(fh)
for key in pii_fields:
data.pop(key, None)
with open(f, 'w') as fh:
json.dump(data, fh, indent='\t', ensure_ascii=False)
fh.write('\n')
Do not remove RepetitionTime, EchoTime, SliceTiming, PhaseEncodingDirection, TotalReadoutTime, Manufacturer, or MagneticFieldStrength -- these are needed by fMRIPrep and for methods reporting.
Step 2: Visually inspect defaced images¶
Always visually check every defaced image. Open them in MRIcroGL or fsleyes and verify the face is removed while brain tissue is intact.
Batch visual QC script
Render mid-sagittal slices for all subjects to quickly spot failures:
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
import glob
for f in sorted(glob.glob('BIDS/sub-*/anat/*_T1w.nii.gz')):
img = nib.as_closest_canonical(nib.load(f))
data = img.get_fdata()
mid = data.shape[0] // 2
plt.figure(figsize=(4, 5))
plt.imshow(np.rot90(data[mid, :, :]), cmap='gray')
plt.title(f.split('/')[-1].split('_')[0])
plt.axis('off')
plt.savefig(f.replace('.nii.gz', '_deface_check.png'))
plt.close()
Defacing can sometimes fail
pydeface may remove brain tissue or leave facial features in rare cases, especially with atypical populations or pediatric data. If pydeface fails on specific subjects, try afni_refacer_run (which replaces rather than removes the face) on those subjects.
Step 3: Review participants.tsv¶
Check for fields that could re-identify participants:
- Exact scores searchable in public databases (e.g., FIDE Elo ratings, standardised test scores) -- bin into ranges instead
- Rare combinations of age + sex + clinical score -- consider binning ages into 5-year ranges for small samples
A recent study showed that rich clinical metadata can create unique fingerprints even in moderately sized datasets.
Step 4: Strip gzip headers (optional)¶
If the BIDS validator warns about GZIP_HEADER_MTIME or GZIP_HEADER_FILENAME, the .nii.gz files contain embedded timestamps or original filenames. Strip them:
import gzip, glob, shutil, os
for f in sorted(glob.glob('BIDS/sub-*/anat/*.nii.gz') +
glob.glob('BIDS/sub-*/func/*.nii.gz')):
nii = f[:-3] # strip .gz
with gzip.open(f, 'rb') as gz, open(nii, 'wb') as out:
shutil.copyfileobj(gz, out)
os.remove(f)
with open(nii, 'rb') as src, gzip.GzipFile(f, 'wb', mtime=0) as gz:
shutil.copyfileobj(src, gz)
os.remove(nii)
Files produced by dcm2niix typically don't have this issue (it uses -n by default). Files from dicm2nii (MATLAB) may.
GDPR note¶
For datasets collected in the EU: brain MRI is special category data under GDPR Article 9. Even defaced MRI may still be considered personal data. Use restricted access with a Data Use Agreement (not fully open sharing), and ensure your consent form covers data sharing. The Open Brain Consent templates have GDPR-compatible versions. Your S-case must mention data sharing for RDR restricted access.
Checklist¶
Before proceeding to upload your data to RDR:
- All structural images defaced and visually checked
- PII fields removed from all JSON sidecars (via BIDSonym or manual script)
-
participants.tsvreviewed for re-identification risk - No raw DICOM files or
sourcedata/with identifiable information - Gzip headers cleaned (if applicable)
- Dataset passes
bids-validator-denowith no errors