NIMH Data Archive Tools
In this central location you can access and launch all tools maintained by the NIMH Data Archive. In many cases, a recent version of Java is required in order to use NDA tools. OpenJDK is not supported.
Researchers throughout and beyond the field of mental health research use the NDA GUID Tool, originally developed for the use of the autism research community and the National Database for Autism Research (NDAR), to generate participant identifiers. The GUID Tool takes personally information provided by research participants and uses it to securely create a unique identifier. Using this tool, participant data can be linked across studies and laboratories while always maintaining the participant's privacy. In order to use this tool, you will need a user account with the appropriate permission assigned.
Validation and Upload Tool
Contributors harmonizing and uploading their data to the NIMH Data Archive must use the Validation and Upload Tool to send their data and complete this process. This tool connects to the Data Dictionary, and then allows you to load data templates and validate them against their definitions. This helps ensure that data in NDA is harmonized to a standard and serves as a "pre-upload" QA check on your data. After data is successfully harmonized, the same tool is used to package and upload the data to your NDA Collection. In addition to working with CSV data templates, the tool also supports direct uploads from a hosted AWS-RDS database. There are currently three different versions of the tool. Please review them below to determine the most appropriate one for your situation. If in doubt, please use the first option: the HTML version.
HTML Validation and Upload Tool
This version of the tool allows you to use it a webpage, validating the quality of your data and upload it directly through your web browser. Chrome, Firefox, Safari, and recent versions of Internet Explorer are supported
Java Validation and Upload Tool
This version of the tool is a JNLP file you download to your local system and run. In can be run remotely as well using X forwarding. Currently, Java 8 is supported fully but Java 9 and 10 may cause issues when attempting to use this tool.
PLEASE NOTE: There is a known bug with Java that may affect Linux users of this tool. When authenticating in order to upload, the tool may crash without feedback or an error message. This will not be addressed until the release of Java 10. Prior to this, a different system will need to be used when submitting data if you encounter this bug.
Users approved for access to NDA shared data can use the Download Manager to view and download their previously-created packages. Once you have built a package using our various query tools, this tool must be launched to download it. Packages containing omics data must be accessed through the cloud, and cannot be downloaded directly using this tool. This tool is also available in a command line version.
Researchers submitting and sharing their data through NDA make use of the Data Dictionary to harmonize and upload their data. Below are tools currently available for assistance working with the Data Dictionary to harmonize your data:
Custom Scope Validation
Every structure in the Data Dictionary in use by researchers has a set of default element names used to identify the variables, and in many cases, a set of values used to code the data for that element (e.g. 1;2;3). Researchers have the option to use their own element names, and their own values, specific to their individual project rather than using the basic element names and value sets defined in the Data Dictionary, through the Custom Scope tool when validating. To make use of this:
- Work with NDA curators over the course of your project (when creating your Data Expected list) to ensure your values and/or element names are provided as valid translations and aliases.
- When validating, use the "Custom Scope" dropdown menu in the interface of the Validation and Upload Tool to select your Collection number, or other valid scope for your project if one has been assigned.
- Validate and upload the data according to the standard instructions.
This will allow you to avoid recoding your data or renaming your elements.
The mission of the National Institute of Mental Health Data Archive (NDA) is to make research data available for reuse. Data collected across projects can be aggregated and made available using the GUID, including clinical data, and the results of imaging, genomic, and other experimental data collected from the same participants. In this way, separate experiments on genotypes and brain volumes can inform the research community on the over one hundred thousand subjects now contained in the NDA. The NDA’s cloud computation capability provides a framework in support of this infrastructure.
How does it work?
The NDA holds and protects rich datasets (fastq, brain imaging) in object-based storage (Amazon S3). To facilitate access, the NDA supports the deployment of packages (created through the NDA Query tools) to an Amazon Web Service Oracle database. Originally developed for the National Database for Autism Research (NDAR), and so called miNDAR (miniature NDAR), these databases contain a table for each data structure in a package. Associated raw or evaluated data files are available via read-only access to NDA’s S3 objects. Addresses for those objects in the associated package are provided in the miNDAR table titled S3_LINKS. By providing this interface, the NDA envisions real-time computation against rich datasets that can be initiated without the need to download full packages. Furthermore, a new category of data structure has been created called "evaluated data." Tables for these structures will be created for each miNDAR, allowing researchers using NDA cloud capabilities and computational pipelines to write any analyzed data directly back to the miNDAR database. This will enable the NDA to make this data available to the general research community when appropriate.
miNDARs can also be populated with your own raw or evaluated data and uploaded directly back into the NDA for a streamlined data submission directly from a hosted database.
How do I get started?
To begin, request that cloud access be added to your account. Once your request is approved, the option to launch packages to a cloud hosted database will be available during package creation. You can deploy previously generated packages as well as new ones.
To move NDAR data to Oracle, first create a package in NDAR. Then, following registration, enter the package id and credentials requested on the miNDAR tab. This will start the miNDAR creation process, which takes approximately 10 minutes. Once created, the miNDAR connect details will be emailed to you, and can be used to establish a connection with your credentials.
File data that is usually included in a package download will now be accessible via S3. Each package will have a table “S3_LINKS” which contains URIs for all objects in that package. Using direct calls to Amazon Web Service's S3 API, a third party tool, or client libraries, data from these objects can be streamed or downloaded.
For security purposes temporary AWS credentials are needed to access the S3 Objects. Temporary credentials are issued by authenticating with a web service using your NDAR username and password. AWS credentials can be obtained directly from the web service (see examples on our GitHub page) or from the download manager, which is available in both a GUI and command line version.
For the GUI version, go to the 'Tools' menu and select 'Generate AWS Credentials'.
For the command line download manager, use the following syntax:
java -jar downloadmanager.jar --username user --password pass --g
For help with the command line download manager, use the following switches: -h, --help
The web service provides temporary credentials in three parts:
- an access key,
- a secret key,
- and a session token
All three parts are needed in order to authenticate properly with S3 and retrieve data.
Additionally the web service provides an expiration timestamp for the token in YYYY-MM-DDTHH:MM:SS-TZ format (TZ=HH:MM). New keys can be retrieved at any time. A service oriented approach allows for implementation of pipeline procedures which can request new keys at the appropriate stage of data processing.
Please see our Cloud Tutorials for a video demonstration of how to create a miNDAR, how to generate temporary security credentials, and how to use these to retrieve data. Please contact the Help Desk with any questions.