ManGO for active research data¶
After logging in in ManGO with your KU Leuven credentials, every researcher should have access to a personal folder within the Hoplab project. In your personal folder, you can then create a subfolder per (active) study. If this is not the case, contact Klara to create a new folder and set the correct permissions. Check out the research workflow page for more specific guidelines on how to organize your study data in ManGO.
ManGO is based on the open source software iRODS. General documentation on ManGO can be found here. This page will focus on the setup and use of two clients to interact with ManGO, i.e., the Python-iRODSClient and ManGO Ingest, and is mainly based on the instructions on this and this page, respectively. Note that there exist other clients to connect to ManGO (e.g., the ManGO portal in the browser, iron CLI Client, iCommands and several SFTP clients), which we won't cover here but you are of course welcome to explore.
The Python-iRODSClient (PRC)¶
Installing and setting up the client¶
Using a virtual environment keeps things clean and avoids breaking other Python setups. To do so, open a terminal in your project folder and activate it. To avoid issues with KU Leuven policy restrictions, we use Conda for this. If you don't have Conda installed, download and install it from Conda's official website.
You should see (mango_env) in your terminal. Now, we can install the Python-iRODSClient and check the installation:
Next, we have to make sure you are logged in to ManGO and your irods_environment.json is configured correctly. The Python client needs an environment file that tells it which server to connect to, your username, SSL settings and authentification method. To do so, we have to install the required authentication package:
Then, go to the "How to connect" page in the ManGO portal to get your irods_user_name, irods_zone_name and irods_host information. Execute the command below with your own information in your terminal:
Tip
To authenticate in a Python shell or within a script file, run the following snippet:
You will be redirected to your terminal, where you have to click the displayed authentication link. After successful login, an environment file is created at ~/.irods/irods_environment.json. You now have a connection with the default password duration of 60 hours. However, it is also possible to log in with a password of long duration (7 days) if you have a Linux client environment with iCommands installed (see this page).
Creating a ManGO session and uploading files to iRODS¶
This section contains a script to create a ManGO (iRODS) session using your environment file and upload files from a local directory to an iRODS collection, using the Python-iRODS client. It supports both single files and full folders (including subfolders) and recreates the folder structure in iRODS. Optionally, it also sets read permissions.
Before you start, make sure that:
- You have access to your personal folder on ManGO (do a quick check via the browser)
- You installed the Python-iRODSClient (see previous section)
- You are logged in and your irods_environment.json is configured correctly (see previous section)
- Your data is in BIDS format
Before running the script, make sure to change the following variables:
local_pathto your local folder containing the datacollection_pathto the ManGO/iRODS destination folder
Common errors
- If you get an error related to
irods_environment.json, check that all values (especiallyirods_authentication_uid) are correct and make sure numeric values are integers. - If the environment file is not found, verify that you are logged in and that the file exists at
~/.irods/irods_environment.json.
Tip
- The name of the folder itself is not uploaded, only its contents. If you want to preserve its name, include it in
collection_path. - The destination collection can already exist. If it does not exist, the script will create it automatically.
import os
import ssl
from pathlib import Path
from irods.session import iRODSSession
from irods.access import iRODSAccess
### STEP 1: Locate the iRODS environment file
try:
env_file = os.environ['IRODS_ENVIRONMENT_FILE']
except KeyError:
env_file = os.path.expanduser('~/.irods/irods_environment.json')
if not os.path.exists(env_file):
print(f"ERROR: iRODS environment file not found at: {env_file}")
else:
print(f"Using iRODS environment file: {env_file}")
ssl_context = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH)
ssl_settings = {'ssl_context': ssl_context}
### STEP 2: Set your paths
# Local folder or file to upload
local_path = Path(r"C:\Path\To\Your\BIDS_Folder")
# Destination iRODS collection
collection_path = "/ghum/home/Hoplab/YourName/YourStudy"
if not local_path.exists():
print(f"ERROR: Local path does not exist: {local_path}")
else:
print(f"Local path found: {local_path}")
### STEP 3: Create iRODS session and upload folder
if local_path.exists() and os.path.exists(env_file):
with iRODSSession(irods_env_file=env_file, **ssl_settings) as session:
# Ensure collection exists
try:
session.collections.get(collection_path)
print(f"Collection exists: {collection_path}")
except Exception:
print(f"Creating collection: {collection_path}")
session.collections.create(collection_path, recurse=True)
# Upload directory contents
if local_path.is_dir():
for file_path in local_path.rglob('*'):
if file_path.is_file():
# Create the relative path structure
relative_path = file_path.relative_to(local_path)
irods_file_path = f"{collection_path}/{relative_path.as_posix()}"
# Create parent collections if needed
parent_collection = "/".join(irods_file_path.split("/")[:-1])
try:
session.collections.get(parent_collection)
except Exception:
session.collections.create(parent_collection, recurse=True)
try:
session.data_objects.put(str(file_path), irods_file_path)
print(f"Uploaded: {file_path.name}")
except Exception as e:
print(f"Error uploading {file_path.name}: {e}")
# Upload single file
else:
session.data_objects.put(str(local_path), collection_path)
print(f"Uploaded: {local_path.name}")
print(f"\nFinished processing {local_path}")
# Optional: set permissions on the collection (adjust user details and permission type)
try:
access = iRODSAccess("read", collection_path, "USERNAME")
session.acls.set(access, recursive = True)
print(f"Set read permissions for USERNAME on {collection_path}")
except Exception as e:
print(f"Note: Could not set permissions: {e}")
else:
print("Cannot proceed: check that both local_path and iRODS environment file exist.")
Downloading files from iRODS¶
This section contains a script to download a complete iRODS collection to your computer using the Python-iRODS client. It supports both single files and full folders (including subfolders). It allows for parallel downloads in case of large datasets.
It will:
- Connect using your irods_environment.json (see this section)
- Download all files in the chosen ManGO folder (recursively, so it also handles all subfolders)
- Recreate the same folder structure locally
- Skip files that already exist to prevent accidental overwrite
Before running the script, make sure to change the following variables:
IRODS_COLLECTION= path to the ManGO folder you want to download fromLOCAL_DEST= your local folder you want to download toTHREADS= set the number of parallel downloads (4-8 is usually safe)
Common errors
- If you get an error related to
irods_environment.json, check that all values (especiallyirods_authentication_uid) are correct and make sure numeric values are integers. - If the environment file is not found, verify that you are logged in and that the file exists at
~/.irods/irods_environment.json.
Tip
- The name of the folder itself is not downloaded, only its contents. If you want to preserve its name, include it in
LOCAL_DEST. - The destination collection can already exist. If it does not exist, the script will create it automatically.
"""
Download a full ManGO (iRODS) folder to your computer using parallel downloads.
WHAT THIS SCRIPT DOES
- connects to ManGO using your irods_environment.json
- downloads all files in the specified collection
- recreates the same folder structure locally
- skips files that already exist with the same size
- uses multiple threads to download several files at once
"""
import os
import ssl
from pathlib import Path
from irods.session import iRODSSession
from concurrent.futures import ThreadPoolExecutor
# -------------------------
# STEP 1 — CONFIG
# -------------------------
# ManGO folder you want to download FROM
IRODS_COLLECTION = "/ghum/home/Hoplab/YourName/YourStudy"
# Local folder you want to download TO
LOCAL_DEST = Path(r"C:\Path\To\Your\BIDS_Folder")
# Set number of parallel downloads (4–8 is usually safe)
THREADS = 4
# -------------------------
# STEP 2 — FIND YOUR IRODS LOGIN FILE
# -------------------------
env_file = os.environ.get(
"IRODS_ENVIRONMENT_FILE",
os.path.expanduser("~/.irods/irods_environment.json")
)
if not os.path.exists(env_file):
raise RuntimeError(f"irods_environment.json not found: {env_file}")
# -------------------------
# STEP 3 — FUNCTION TO GATHER FILES
# -------------------------
def gather_files(session, collection, local_folder, tasks):
"""
Recursively collect all files in the collection and subcollections.
"""
local_folder.mkdir(parents=True, exist_ok=True)
for obj in collection.data_objects:
local_file = local_folder / obj.name
tasks.append((obj, local_file))
for sub in collection.subcollections:
sub_local = local_folder / sub.name
gather_files(session, session.collections.get(sub.path), sub_local, tasks)
# -------------------------
# STEP 4 — FUNCTION TO DOWNLOAD ONE FILE
# -------------------------
def download_one(session, obj, target):
"""
Download a single file if it doesn't exist or is incomplete.
"""
target.parent.mkdir(parents=True, exist_ok=True)
if target.exists() and target.stat().st_size == obj.size:
return f"OK already exists: {target}"
try:
session.data_objects.get(obj.path, str(target))
return f"Downloaded: {obj.name}"
except Exception as e:
return f"Error downloading {obj.name}: {e}"
# -------------------------
# STEP 5 — CONNECT AND START
# -------------------------
ssl_context = ssl.create_default_context()
with iRODSSession(irods_env_file=env_file, ssl_context=ssl_context) as session:
root = session.collections.get(IRODS_COLLECTION)
# Gather all tasks first
tasks = []
gather_files(session, root, LOCAL_DEST, tasks)
# Print total files found
print(f"Total files found: {len(tasks)}")
print(f"Starting download with {THREADS} threads...\n")
# Download in parallel
with ThreadPoolExecutor(max_workers=THREADS) as pool:
futures = [pool.submit(download_one, session, obj, tgt) for obj, tgt in tasks]
for f in futures:
print(f.result())
print("\nFinished download.")
ManGO Ingest¶
ManGO Ingest watches a local folder and uploads its contents to a ManGO collection. After triggering it, it automatically detects new or modified files and uploads only (part of) those changes.
This is useful when you regularly add or update files locally (for example, during ongoing data collection) and want them pushed efficiently to ManGO.
ManGO Ingest can:
- run once and upload everthing currently present, or
- continuously monitor the folder and upload new files automatically
Remarks & practical tips
- Uploads are one-way only: changes made in ManGO are not pulled down locally.
- Deleting files locally does not remove them from ManGO.
- Empty folders cannot be uploaded.
- Removing the monitored folder (or unplugging the drive) while ingest runs will cause errors.
- Re-running mango-ingest with the same local path only uploads new or modified files/folders.
- A
.jsonlog file is created for every sync. This file records upload status and errors and can be used for restart.
We'll guide you through the setup, but if you want more information, here are a few useful resources:
Installing and setting up the client¶
-
Download the mango-ingest development branch and unzip it into:
C:\Workdir\MyApps -
Open a command prompt and navigate
-
Create and activate a virtual environment
-
Install mango-ingest and check packages
-
If you haven't yet, install the Python-iRODSClient (PRC) and authentication tools, and log in to iRODS
-
Adapt your paths in the line below and start ingest for a local folder
Command options explained¶
| Category | Option | Description |
|---|---|---|
| Core paths | -d |
Destination collection path in ManGO |
| Core paths | -p |
Local folder path to upload and monitor |
| Upload behaviour | -nw |
Run once only. Uploads current files and exits. If omitted, the folder is continuously monitored and new files are uploaded automatically as they appear |
| Upload behaviour | -r |
Upload folders recursively, including all subfolders and files |
| Upload behaviour | --sync |
Ensures existing folder contents are uploaded when starting ingest, useful when starting ingest for the first time (automatically implied with -nw) |
| File integrity and recovery | --verify-checksum |
Verifies uploaded files match originals after transfer to prevent silent corruption during transfer (recommended for large datasets) |
| File integrity and recovery | --restart <logfile.json> |
Restarts failed uploads using a previous run’s JSON log file |
| File selection (filtering) | --glob "*.bdf" |
Upload only files matching a glob pattern (e.g. EEG files only) |
| File selection (filtering) | --regex PATTERN |
Upload only files matching a regular expression |
| File selection (filtering) | --ignore-glob "*.tmp" |
Ignore files matching a glob pattern (e.g. temporary files) |
| File selection (filtering) | --ignore PATTERN |
Ignore files matching a regex pattern |
| Metadata | --md-mtime |
Adds file modification time as metadata |
| Metadata | --md-path REGEX |
Extracts metadata from folder names |
For advanced metadata examples, see doc/examples/extract_metadata.py.
Quick rule of thumb¶
Use:
when you want a safe one-time upload of a full dataset.
Omit -nw when you want continuous automatic syncing during ongoing data collection.