The platform design relies on open formats, APIs, and standards to allow for extensibility and promote interoperability. The key design ideas are based on the following constraints and considerations:
The platform should integrate data resources from different infrastructures,
Data and tools should be integrated without undue duplication,
Integration of data and tools in the platform by community members should be feasible,
Datasets and processing tools should implement the FAIR principles,
Data governance should remain with the original data stewards or providers,
The platform should rely on open formats and standards to foster reuse and integration with other projects, and
The portal should provide intuitive navigation and provide users with documentation and help resources.
The CONP consists of several key components (Fig.1, see also Table4):
A data infrastructure layer, incorporating disparate independent data repositories (e.g., Zenodo, http://www.zenodo.org, LORIS7, or the Open-Science Framework OSF5);
A (meta)data integration layer, leveraging DataLad13, GitHub14, Boutiques tool descriptors12, enabling uniform data search queries based on the Data Tags Suite (DATS) model15;
An analysis layer that allows for simple download of tools and easy use of High-Performance Computing (HPC) environments; and
An interface layer, which controls the interaction between these components and will be outlined further in the Results section.
Architecture of the Canadian Open Neuroscience Platform. The platform is comprised of multiple tiers including: (i) Independent data infrastructure; (ii) Metadata integration across tools and datasets via standard models (Biocaddie DATS, Boutiques descriptors); (iii) Data analysis on High-Performance Computing and; (iv) Web and command-line interfaces.
The CONP takes advantage of distributed data repositories, each with their own infrastructures, access control requirements, APIs, and licensing. This importantly gives flexibility to manage collections using specific context-appropriate tools, rather than prescribing an incomplete one-size-fits-all solution. The CONP presently supports accessing and integrating data from several flexible domain-agnostic datastores (OSF, Zenodo, FRDR-DFDR, https://www.frdr-dfdr.ca/), specific brain imaging repositories (LORIS, XNAT, Brain-CODE), and the commonly used HTTP and FTP web protocols. This set of supported infrastructures is intentionally extensible to any other repository which allows access via programmatic web-compatible interfaces (e.g. a RESTful API).
To integrate datasets across infrastructures, the CONP uses DataLad as a backend and GitHub to host the metadata. Crawlers automate both the discovery of tools (on Zenodo, http://www.zenodo.org) and datasets (on Zenodo and OSF) and the DataLad and GitHub integration workflows. CircleCI16 continuously tests if datasets are available and if data are accessible by testing the download of a few files from the datasets.
The CONP adopts a decentralized architecture, to accommodate the various governance, ethical, and performance models required by data owners. For instance, some datasets may not easily be stored outside of the jurisdiction where they were acquired, while some institutions require local control of data storage, with some projects preferring to remain in control of access rules. This is all possible in CONP, as data can remain hosted anywhere on the internet.
Integration between datasets is provided by DataLad, a software library for managing Git repositories that references data. In DataLad, datasets are described in a Git repository containing metadata, file URLs and hashes of data blobs managed by git-annex. Importantly, a DataLad dataset does not generally contain the data themselves, which remain stored remotely. DataLad datasets can also be nested to represent dataset aggregation.
The CONP dataset consists of a main DataLad dataset and its metadata stored on GitHub (github.com/CONP-PCNO/conp-dataset) and referenced in the main DataLad index (http://datasets.datalad.org). The use of GitHub enables a variety of features useful for open-source software development; including issue tracking, code reviews, pull requests, branch protection, and integration with various applications. Datasets are integrated as Git submodules of the main dataset, and may be hosted on GitHub or on any other platform including GitLab or even a simple web server. This has the added benefit of being able to point to a specific commit, allowing continued evolution of the remote subdataset while the CONP portal keeps a reference to the stable version of the root dataset. Any DataLad dataset can be integrated into CONP provided that it contains a README file and a Data Tags Suite (DATS17) model file describing it. In addition, a configuration script can be added to the root of the dataset, to perform any required initialization.
The data themselves can be stored in any server implementing a protocol supported by git-annex, including HTTP, FTP, and many more. We used this flexibility to integrate data coming from three main types of sources. First, brain data archives such as the LORIS7, XNAT18, and Brain-CODE19 platforms provide a complete neuroscience data management solution for data ingestion, quality control, visualization, access control, and querying. They are commonly used to support large-scale multi-site longitudinal studies with hundreds of participants. Second, multi-disciplinary research data archives such as Zenodo in Europe, the Open Science Framework in the USA5, and the Federated Research Data Repository (FRDR)20 in Canada, provide simple ways to share research data publicly through the web and to guarantee long-term archival, findability, and immutability of data objects through Digital Object Identifiers (DOIs). They are typically used for local studies or companion data to a publication. Third, simple internet hosts accessible through the HTTP or FTP protocol allow for flexible integration of any other data already available online. CONP also provides local data-hosting for users who do not have the resources to make use of these other options.
Through git-annex, DataLad also supports authentication protocols, a critical feature for the ethical sharing of neuroscience data with restricted access. We extended this capability to support authentication to the LORIS, Zenodo, and OSF platforms. LORIS uses a common username/password authentication, which could be added to DataLad without particular challenges. Zenodo, however, implements private data sharing through secret tokens added to the file URLs. Since the file URLs are part of the DataLad repository and are therefore publicly shared, we implemented a custom mechanism to add and remove tokens from URLs on demand.
To leverage the capabilities of existing research data archives (currently Zenodo and OSF, and in the future FRDR), we developed a crawling framework to manage the life cycle of DataLad datasets on GitHub. As a result, users can upload datasets to the CONP through these web platforms, without having to install and learn DataLad, or to become familiar with our GitHub workflow. The CONP data crawler performs the following actions, implemented as a base class containing most of the GitHub and DataLad logic, and as a set of derived classes containing the API calls specific to each crawled platform:
Search for CONP-tagged datasets in web platforms;
When a new dataset is found, create a new DataLad dataset;
When a dataset modification is detected, update the corresponding DataLad dataset;
Push modifications to CONP forked GitHub repository;
Create a pull request for each modified dataset, for the CONP maintainers to review and approve.
In addition, if no DATS model is found in the datasets, one is created automatically from the fields available in the web platforms, with minimal information such as title, license and creators.
The CONP includes a dataset testing suite to mitigate the reliability challenges of decentralized systems. We implemented the testing suite in the CircleCI platform, due to its support for multithreaded testing, FTP connections, and interactive SSH sessions in testing environments. Hosting CONP DataLad datasets on GitHub allows for transparent integration with CircleCI. Similar to a software repository, dataset tests are triggered with every GitHub pull request, and their successful execution is required for the pull request to be approved by the maintainers. To reduce execution time, the testing framework only runs the tests for the datasets influenced by the pull request.
Datasets may become unavailable for a variety of transient reasons, including network interruptions, operational downtimes of the hosting platforms, or configuration errors. To detect these issues, we configured CircleCI to periodically test all the datasets available through the CONP every four hours, providing continuous monitoring. Results of this periodical testing are stored in CircleCI artifacts and are automatically displayed as status badges in the CONP portal. To increase robustness against transient errors, we used Pytests flaky module to re-run tests three times upon failure, with a 5-second delay.
The test suite tests the following properties for every dataset:
Presence of a README file at the root of the dataset,
Presence of a DATS model complying with our extended schema,
Successful installation of the dataset with DataLad,
Integrity of the git-annex repository, and
Successful download of the four smallest files from a sample to reduce runtime.
For datasets that require authentication, we include credentials through CircleCI environment variables. To reduce the associated security risks, we configured the testing framework to skip the testing of authenticated datasets in pull requests. From these environment variables, the testing framework generates a DataLad authentication provider (LORIS, Brain-CODE), or configures the dataset to use access credentials (Zenodo). The testing suite is executed in a Docker container also available for download to replicate the testing environment.
As the CONP portal brings together two types of research objects, software tools and datasets, we have reused two metadata standards developed to document these objects. The Boutiques standard12 describes a tools execution, inputs and outputs. We chose the Data Tags Suite (DATS) model17 developed by the BioCaddie consortium (Big Data to Knowledge NIH funds) to build the Datamed (https://datamed.org/) platform, for dataset description. Datamed was designed to be an equivalent of PubMed for datasets21, and DATS follows the architecture of the Journal Article Tag Suite JATS. This choice was driven by the flexible nature of DATS and its associated material (e.g., validator, documentation). DATS also has the capacity to represent sub-datasets, a feature that can be used in association with the DataLad sub-datasets mechanism (implemented with git submodules).
These two standards are used to extract information about the research objects to be displayed in the portal, as well as provide the necessary JSON-LD information for making the datasets discoverable by Google Dataset Search. The DATS model also allows for an RDF representation of the information, which enables integration of the CONP metadata as a knowledge graph in BlueBrain Nexus22. The portal includes an advanced search interface mapped to a BlueBrain Nexus SPARQL endpoint where the DATS model files are regularly exported.
The DATS model contains a number of required fields: the name and description of the dataset, the name and affiliation(s) of the individual(s) who generated the data, the license under which a dataset is released, keywords, and data types and formats. It may also include details regarding related publications, funding bodies, and cross-referencing derived datasets. We have also applied the extensibility of the DATS model to add specific fields such as a structured record of the datasets source, allowing searches by institution, city, or country of origin17.
The CONP portal goes beyond the findability of tools, directly integrating tools into workflows and enabling their execution on HPC systems.
Analysis tools are uniformly described in Boutiques, an open specification and software library for sharing tools according to the FAIR principles1. Boutiques descriptors are JSON objects containing a specification of the tool input data, parameters, and output data. They link to a Docker or Singularity container image where the tool and all its dependencies are installed and configured for execution. Boutiques tools can be reused in various platforms, such as workflow engines, as exemplified in TIGR-PURR (https://github.com/TIGRLab/TIGR_PURR), or in web platforms such as CBRAIN23 or VIP (https://www.creatis.insa-lyon.fr/vip/).
Boutiques tools can be published, archived, and retrieved in the Zenodo research archive or in the OpenAIRE-Nexus project. Once published, Boutiques tools receive a DOI, which makes their archives permanently findable.
Similar to the data integration layer, tools can be executed through both command-line and web interfaces. The Boutiques command-line tool can be used to run the tools locally with a uniform interface, provided that a container engine is installed. This is useful for testing analyses or processing smaller datasets. CONP datasets can be downloaded locally for processing through the DataLad command-line or Python API. Boutiques Python API also enables tool integration in external pipeline engines such as Pydra24, Nextflow25, or Apache Spark26.
For use-cases that benefit from the use of HPC clusters, the Clowdr command-line tool and Python API27 can easily be used to apply Boutiques tools concurrently to multiple subjects on HPC clusters available through the SLURM workload manager, such as the ones provided by Compute Canada (https://www.computecanada.ca/), or on the Amazon Elastic Computing Cloud (EC2). This allows CONP users to leverage their own resource allocation and to process CONP datasets through the DataLad interface.
Many CONP tools are also installed in CBRAIN23, a web portal interfaced with storage and computing resources at HPC centers, to provide a higher-level interface for users who do not want to use the command-line, or for developers who prefer to interact with HPC resources through a web API. CBRAIN can import Boutiques descriptors, and create web forms and HPC jobs to launch and monitor the tools. Pipelines that were installed as Docker images are converted to Singularity for deployment on HPC clusters. To facilitate the processing of datasets accessible via the CONP, CBRAIN also interfaces with DataLad, downloading files on-demand for processing.
Running a data analysis pipeline on CBRAIN requires a CBRAIN account. There is no current billing model associated with the compute part of the platform, because CBRAIN relies on academic computing resources, primarily Compute Canada, obtained through resource allocation competitions. The CBRAIN infrastructure allows for external compute resources to be attached to the platform such that an international research laboratory could use its own resources. CBRAIN also has a certain amount of computing time allocated on Compute Canada and when possible the team can offer these for reasonable usage. This needs to be directly requested to the CBRAIN infrastructure governance team through an email to CBRAIN support. Data derived from processing will be stored on the CBRAIN infrastructure, and the agreement by the CBRAIN team to process data will depend on both the compute time (if on CBRAIN Compute Canada allocation) and on the capacity to store these derived data. We note that CBRAIN can also attach data providers with the Principal Investigators own disk space allocation on Compute Canada, on other accessible infrastructures, or even on their own laboratory servers. Given the variety of situations, requests for compute time and disk space are handled on a case by case basis.
Finally, Boutiques executions, including local, Clowdr and CBRAIN ones, also collect anonymized provenance records for activity monitoring, traceability, and other applications.
Read this article:
Data and Tools Integration in the Canadian Open Neuroscience ... - Nature.com
- Marcus Neuroscience Institute to Host Brain and Spine Symposium - South Florida Hospital News - March 30th, 2025 [March 30th, 2025]
- Elon University to launch neuroscience major in fall 2025 - Today at Elon - March 30th, 2025 [March 30th, 2025]
- The brains stalwart sentinels express an unexpected gene - The Transmitter: Neuroscience News and Perspectives - March 30th, 2025 [March 30th, 2025]
- Video catches microglia in the act of synaptic pruning - The Transmitter: Neuroscience News and Perspectives - March 30th, 2025 [March 30th, 2025]
- Null and Noteworthy: Reexamining registered reports - The Transmitter: Neuroscience News and Perspectives - March 30th, 2025 [March 30th, 2025]
- Accepting the bitter lesson and embracing the brains complexity - The Transmitter: Neuroscience News and Perspectives - March 30th, 2025 [March 30th, 2025]
- NIH neurodevelopmental assessment system now available as iPad app - The Transmitter: Neuroscience News and Perspectives - March 30th, 2025 [March 30th, 2025]
- Stronger Bonds Before Birth Shape Healthier Mother-Child Futures - Neuroscience News - March 30th, 2025 [March 30th, 2025]
- How Emotionally Intelligent People Learn to Control Their Inner Voice, Backed by Neuroscience - Inc. - March 30th, 2025 [March 30th, 2025]
- Gabriele Scheler reflects on the interplay between language, thought and AI - The Transmitter: Neuroscience News and Perspectives - March 30th, 2025 [March 30th, 2025]
- Worlds first crowd-sourced neuroscience study aims to understand how our brains predict the future - EurekAlert - March 15th, 2025 [March 15th, 2025]
- Rewriting Neuroscience: Possible Foundations of Human Intelligence Observed for the First Time - SciTechDaily - March 15th, 2025 [March 15th, 2025]
- Calculating neurosciences carbon cost: Q&A with Stefan Pulver and William Smith - The Transmitter: Neuroscience News and Perspectives - March 15th, 2025 [March 15th, 2025]
- The future of neuroscience research at U.S. minority-serving institutions is in danger - The Transmitter: Neuroscience News and Perspectives - March 15th, 2025 [March 15th, 2025]
- Dopamine and social media: Why you cant stop scrolling, according to neuroscience - PsyPost - March 15th, 2025 [March 15th, 2025]
- Neuroscience Discovered a Clever Trick for Squeezing More Joy Out of Everyday Pleasures - Inc. - March 15th, 2025 [March 15th, 2025]
- The limits of neuroscience - The Transmitter: Neuroscience News and Perspectives - March 15th, 2025 [March 15th, 2025]
- BPOM Explains The Benefits Of Fasting From The Health And Neuroscience Side - VOI English - March 15th, 2025 [March 15th, 2025]
- How tiny tardigrades could help tackle systems neuroscience questions - The Transmitter: Neuroscience News and Perspectives - March 15th, 2025 [March 15th, 2025]
- Alison Preston explains how our brains form mental frameworks for interpreting the world - The Transmitter: Neuroscience News and Perspectives - March 15th, 2025 [March 15th, 2025]
- The Mystical Mind Meets Neuroscience: Seeking the Roots of Consciousness - Next Big Idea Club Magazine - March 15th, 2025 [March 15th, 2025]
- Myosin Therapeutics Closes Second Seed Round to Advance Clinical Trials for Innovative Cancer and Neuroscience Therapies - PR Newswire - March 5th, 2025 [March 5th, 2025]
- Neuroscience Ph.D. programs adjust admissions in response to U.S. funding uncertainty - The Transmitter: Neuroscience News and Perspectives - March 5th, 2025 [March 5th, 2025]
- New tools help make neuroimaging accessible to more researchers - The Transmitter: Neuroscience News and Perspectives - March 5th, 2025 [March 5th, 2025]
- Future Thinking Training Reduces Impulsivity - Neuroscience News - March 5th, 2025 [March 5th, 2025]
- Null and Noteworthy, relaunched: Probing a schizophrenia biomarker - The Transmitter: Neuroscience News and Perspectives - March 5th, 2025 [March 5th, 2025]
- How to communicate the value of curiosity-driven research - The Transmitter: Neuroscience News and Perspectives - March 5th, 2025 [March 5th, 2025]
- Cognitive neuroscience approach to explore the impact of wind turbine noise on various mental functions - Nature.com - March 5th, 2025 [March 5th, 2025]
- Football on the Brain: Helping coaches embed neuroscience knowledge - Training Ground Guru - March 5th, 2025 [March 5th, 2025]
- Taking Control: Using Neuroscience to Build Better Lives - theLoop - March 5th, 2025 [March 5th, 2025]
- Creating a pipeline of talent to feed the growth of Neuroscience: Lessons from Ghana - Myjoyonline - March 5th, 2025 [March 5th, 2025]
- Exclusive: NIH appears to archive policy requiring female animals in studies - The Transmitter: Neuroscience News and Perspectives - February 25th, 2025 [February 25th, 2025]
- Roll On Down The Highway 2025 Tour coming to Neuroscience Group Field - WeAreGreenBay.com - February 25th, 2025 [February 25th, 2025]
- STEM organizations host Neuroscience Outreach Fair for local K-12 students - University of Virginia The Cavalier Daily - February 25th, 2025 [February 25th, 2025]
- Adapt or die: Safeguarding the future of diversity and inclusion funding in neuroscience - The Transmitter: Neuroscience News and Perspectives - February 25th, 2025 [February 25th, 2025]
- The last two-author neuroscience paper? - The Transmitter: Neuroscience News and Perspectives - February 25th, 2025 [February 25th, 2025]
- Gate Neurosciences Strengthens Focus on the Synapse as a Therapeutic Target with Acquisition of Boost Neuroscience - Business Wire - February 25th, 2025 [February 25th, 2025]
- Why Firefly Neuroscience, Inc. (AIFF) Is Soaring This Year So Far - Yahoo Finance - February 25th, 2025 [February 25th, 2025]
- Breaking the barrier between theorists and experimentalists - The Transmitter: Neuroscience News and Perspectives - February 25th, 2025 [February 25th, 2025]
- Preserving Brain Health and Advancing Neuroscience - University of Miami - February 25th, 2025 [February 25th, 2025]
- Science must step away from nationally managed infrastructure - The Transmitter: Neuroscience News and Perspectives - February 25th, 2025 [February 25th, 2025]
- Repurposed Blood Pressure Drug May Treat ADHD - Neuroscience News - February 25th, 2025 [February 25th, 2025]
- How to teach students about science funding - The Transmitter: Neuroscience News and Perspectives - February 25th, 2025 [February 25th, 2025]
- Reflecting on 2024: Advancing Neuroscience Research to Improve Neurological Health - National Institute of Neurological Disorders and Stroke - February 25th, 2025 [February 25th, 2025]
- Brains Hidden Circuitry for Risk and Reward Uncovered - Neuroscience News - February 25th, 2025 [February 25th, 2025]
- Why We Keep Exploring Even After Learning the Best Strategy - Neuroscience News - February 25th, 2025 [February 25th, 2025]
- Unlocking Cellular Youth: The Protein That Reverses Aging - Neuroscience News - February 25th, 2025 [February 25th, 2025]
- This paper changed my Life: Bill Newsome reflects on a quadrilogy of classic visual perception studies - The Transmitter: Neuroscience News and... - February 25th, 2025 [February 25th, 2025]
- Roundup: The false association between vaccines and autism - The Transmitter: Neuroscience News and Perspectives - February 3rd, 2025 [February 3rd, 2025]
- Static pay, shrinking prospects fuel neuroscience postdoc decline - The Transmitter: Neuroscience News and Perspectives - February 3rd, 2025 [February 3rd, 2025]
- Stimulating the brain with Damien Fair - The Transmitter: Neuroscience News and Perspectives - February 3rd, 2025 [February 3rd, 2025]
- Unhealthy Diet Linked to Faster Biological Aging in Young Adults - Neuroscience News - February 3rd, 2025 [February 3rd, 2025]
- Bob Smittcamp Family Neuroscience Institute coming to Fresno in 2026 - ABC30 News - February 3rd, 2025 [February 3rd, 2025]
- Norton Neuroscience Institute selected to pilot national Brain Health Navigator program - Norton Healthcare - February 3rd, 2025 [February 3rd, 2025]
- Coding bonus: Bats hippocampal cells log spatial, social cues - The Transmitter: Neuroscience News and Perspectives - February 3rd, 2025 [February 3rd, 2025]
- ADHD and brainwaves: How neuroscience is changing the way we diagnose the condition - PsyPost - February 3rd, 2025 [February 3rd, 2025]
- David Robbe challenges conventional notions of time and memory - The Transmitter: Neuroscience News and Perspectives - February 3rd, 2025 [February 3rd, 2025]
- How the Brain Processes Space and Time - Neuroscience News - February 3rd, 2025 [February 3rd, 2025]
- Using neuroscience to help establish healthier habits | Opinion - South Bend Tribune - February 3rd, 2025 [February 3rd, 2025]
- Solvonis chairman on heavy-hitting M&A in neuroscience sector - ICYMI - Proactive Investors UK - February 3rd, 2025 [February 3rd, 2025]
- New neuroscience research sheds light on distinct patterns of learning and generalization in autistic adults - PsyPost - January 23rd, 2025 [January 23rd, 2025]
- Neuroscientists need to do better at explaining basic mental health research - The Transmitter: Neuroscience News and Perspectives - January 23rd, 2025 [January 23rd, 2025]
- How Severance shows the possibilities of cognitive neuroscience - Fast Company - January 23rd, 2025 [January 23rd, 2025]
- AdventHealth Welcomes New Leadership In Heart and Vascular Services, Neuroscience and Orthopedics - Northwest Georgia News - January 23rd, 2025 [January 23rd, 2025]
- School of Neuroscience and Language Sciences Program recognized with University Exemplary Department or Program Award - Virginia Tech - January 23rd, 2025 [January 23rd, 2025]
- Early Exposure to Violent Media Linked to Teen Antisocial Behavior - Neuroscience News - January 23rd, 2025 [January 23rd, 2025]
- The Real Cognitive Neuroscience Behind Severance - WIRED - January 23rd, 2025 [January 23rd, 2025]
- The 15 most popular psychology and neuroscience studies in 2024 - PsyPost - January 1st, 2025 [January 1st, 2025]
- The 'lizard brain' lie: How neuroscience demolished the greatest mind myth - BBC Science Focus - January 1st, 2025 [January 1st, 2025]
- Revolutionizing Brain Diagnostics with Light and AI - Neuroscience News - January 1st, 2025 [January 1st, 2025]
- How Early Experiences Shape Genes, Brain Health, and Resilience - Neuroscience News - January 1st, 2025 [January 1st, 2025]
- A nation exhausted: The neuroscience of why Americans are tuning out political news - Indiana Capital Chronicle - January 1st, 2025 [January 1st, 2025]
- Lithium Restores Brain Function and Behavior in Autism - Neuroscience News - January 1st, 2025 [January 1st, 2025]
- Partners in Diversity presents the science of belonging: exploring the neuroscience of inclusion - Here is Oregon - January 1st, 2025 [January 1st, 2025]
- Classical vs. Operant Conditioning: The Brain's Memory Tug-of-War - Neuroscience News - January 1st, 2025 [January 1st, 2025]
- The Personality Gap Between Singles and the Partnered - Neuroscience News - January 1st, 2025 [January 1st, 2025]
- The Neuroscience Behind Vermeers Girl and Its Hypnotic Power - ZME Science - January 1st, 2025 [January 1st, 2025]
- Serotonin, GABA, and Dopamine Drive Hunger and Feeding - Neuroscience News - December 23rd, 2024 [December 23rd, 2024]
- A nation exhausted: The neuroscience of why Americans are tuning out politics - The Conversation - December 23rd, 2024 [December 23rd, 2024]
- UNO Goalie and Neuroscience Grad Shines in Her Athletic and Academic Aspirations - University of Nebraska Omaha - December 23rd, 2024 [December 23rd, 2024]