Frequently Asked Questions
I. General Questions
- What is the National Institute of Mental Health Data Archive (NDA)?
- What is the purpose of the NDA?
- Is the NDA policy available?
- Where does the NDA get its data?
- Who can submit data to the NDA for sharing?
- What type of data are in the NDA and will the data be expanded over time?
- If I am submitting an application to the NIH for funding, how should I include the NDA in my data sharing plans?
- If my NIH application is funded, what steps do I follow so that data I generate can be submitted to an NDA repository?
- How is the NDA funded?
- Do you have an API for the NDA Data Dictionary?
II. Data Submission
- How do researchers submit data to the NDA?
- Does the NIH have reference materials for Independent Review Boards (IRBs) or research participants about the NDA?
- Who is authorized to provide a signature and institutional approval for an NDA Data Submission Agreement?
- How can I establish an "Institutional Business Official" at my organization?
- Are data within the NDA subject to a Freedom of Information Act (FOIA) request?
- As stated in the NDA Policy, the NIH expects that all submissions to an NDA repository will include a certification by the responsible institutional business official(s) of the submitting institution that the expectations of the Policy have been met. For multi-site studies, is the primary submitting institution expected to certify data that are contributed by data collection centers at other institutions?
- Are primary submitting institutions expected to certify that data submission is consistent with applicable laws and regulations in effect at any and all locations at which data were collected?
- What technology does a researcher need to contribute data to the NDA?
- Can imaging and genomic data be submitted to the NDA?
- What is a data dictionary and where can I find the NDA Data Dictionary?
- Does the NDA accept all types of phenotypic data?
- Should data be submitted to the NDA cumulatively or in installments?
- How should NIH intramural staff submit data to the NDA?
- When contributing to the NDA as part of a grant/project, are data from all human subjects expected or are data only from those subjects that are diagnosed with a disease/disorder?
- In addition to research data, is there any other information expected from me when I am contributing data to the NDA?
- In a clinical trial, should I submit data from all subjects screened?
- When estimating the cost of a data submission for my budget, should I base my hourly rate on total costs or only direct costs?
- When logging into the Validation and Upload Tool, it disappears. How do I fix this?
III. Data Availability
- How are data organized in NDA repositories?
- When will the NIMH Data Archive share my data?
- How will updates to data within an NDA repository be handled?
- What quality control processes and measures are in place in the NDA?
- Does the NDA act as a repository for biological samples (e.g. tissues, DNA, cell lines)?
- Can non-research entities (e.g., law enforcement agencies, insurance companies, employers, lawyers) request access to identifiable information corresponding to phenotype and genotype data held in the NDA?
- Is a Certificate of Confidentiality necessary for data submitted to the NDA?
- What data should an investigator provide to the NDA?
- Can a parent provide his or her child's medical and assessment data directly to the NDA?
- Can the NDA compile all information about a single research participant, even if it was submitted by different researchers and at different times?
IV. Data Access
- What is the NDA Omics Policy?
- Who may access shared NDA research data?
- How do I request access to data contained in the NDA?
- What is a Data Use Certification?
- Can the researcher who contributed the data to the NDA be identified?
- What technology does a researcher need to access data?
- Will only data collected from NIH-funded studies be made available through the NDA?
- Which data can researchers use if they are approved for access?
- What is the process for deciding who will gain access to the data submitted to the NDA for broad research access?
- Does a researcher need to pay a fee to access NDA data?
- As a researcher, I may have data that would be of value to the research public. Is there funding available for submitting data to the NDA?
- As a researcher, I have a grant application that proposes to use data made available in an NDA repository. When will NDA data be available for research use?
V. Data Sharing
- Is it possible to share data with only specific investigators through the NDA system prior to the completion of a study?
- When can the data I submit be accessed by approved researchers?
VI. Researcher Use of the NDA
- Can the GUID be used for research purposes, even if there are no plans to share research data with the NDA?
- Can the NDA be used as a repository for data unrelated to mental health?
- What access rights/account privileges should I request when requesting an NDA user account?
- How do I obtain an NDA accession number for my publication?
- How do I cite my, or somebody else's, dataset in my publication?
VII. NDA Capabilities
- How does the research community provide input on the capabilities of the NDA?
- What does data "federation" mean and why is it important to the NDA?
- What is the GUID and why is it so important to the NDA?
VIII. NDA Governance
- What is the NDA Governance Structure?
- How does NIH control access to data within NDA?
- What is the role of the NDA Data Access Committee (DAC)?
- What technical measures has NIH taken to ensure the NDA is secure and protected?
- What administrative and policy measures are taken to ensure that the data contained in the NDA are protected?
The NDA is an informatics system and research data repository developed by the National Institutes of Health (NIH) to share research data. The NDA provides the infrastructure to store, search across, and analyze various types of data. In addition, The NDA provides longitudinal storage of a research participant's information generated by one or more research studies. In other words, the NDA is able to associate a single research participant's genomic, imaging, clinical assessment and other information even if the data were collected at different locations or through different studies. By doing so, the NDA gives researchers access to more data than they can collect on their own, making it easier and faster for researchers to gather, evaluate, and share research information from a variety of sources.
The purpose of the NDA is to support and accelerate the advancement of scientific research by creating an infrastructure that integrates heterogeneous datasets allowing access to more quality research data than an investigator would be able to collect on their own. Generally, the NDA provides the following capabilities:
- Harmonization standards to enable cross site meta-analysis and data comparisons across informatics systems.
- Deployment of useful informatics capabilities for researchers.
- Sharing of quality research data with and between various research communities.
- Query and download access to repositories of phenotypic, imaging, pedigree, and other research data, and the ability to query on compute on genomic data in the cloud.
Yes. The NDA policy is available.
The NDA holds data contributed by researchers. Initially, the data from the eleven Autism Centers of Excellence were deposited and shared in the National Database for Autism Research (NDAR.) This increased to include autism research projects funded through the American Recovery and Reinvestment Act of 2009. In 2014, the platform was expanded to accomodate data generated by the Research Domain Criteria Initiative, as well as NIMH-funded clinical trials. This coverage continues to expand, and today over 400 grantees funded by the NIH have an expectation of sharing data with the NDA. Additionally, the NDA accepts high-quality research data from a number of projects, regardless of funding source and location, and is working to connect relevant research repositories. Researchers interested in submitting data should review our Data Submission Agreement and then contact us in preparation for submission.
Any researcher who has acquired research data appropriate to an NDA-supported research cluster may request to submit data. Contact us to prepare for a submission, or get more information about the necessary steps.
Currently, the NDA is set up to accept standard phenotypic data, imaging and other neurosignal recordings data, and genomic/pedigree data related to mental health on human subjects. The NDA has developed a data dictionary system allowing investigators to define additional data elements and measures. Refer to the Data Dictionary for the complete listing of the NDA's defined data structures.
Many applications for NIH funding should incorporate an expectation for data sharing through the NDA, including all ASD or ASD-related applications across NIH, NIMH-funded clinical trials submitted after June 2014, and a variety of others (Relevant Guide Notices: NOT-MH-09-005, NOT-MH-14-015, NOT-MH-15-012). Applications should include a description of the principal investigator's intent to submit data to the NDA, the type of data to be submitted, and the anticipated timeline for submissions consistent with the prescriptive NDA expectations. To support the costs associated with data submission, include the appropriate cost model in the budget of your application. The Plan section of your research cluster site features a tool to assist with this calculation.
Broadly, a project submitting data to an NDA research cluster will need to complete and submit a Data Submission Agreement, and ensure that their data will be harmonized to the NDA Data Dictionary, create a list of data items to be collected in the Data Expected tab of their Collection, ensure that appropriate data sharing language is included in consent document, and implement a method to collect the subject information needed to generate GUIDs for all participants. This includes obtaining sufficient informed consent from participants, obtaining sufficient PII from participants to create GUIDs, and working with Data Dictionary curation staff prior to the initial submission. NDA staff are available to provide personal assistance throughout the process. We also maintain a series of video tutorials reviewing the steps to data submission.
The NDA is co-funded or receives other resource contributions from several institutes and centers at the NIH, including:
- National Institute of Mental Health (NIMH)
- National Institute of Neurological Disorders and Stroke (NINDS)
- National Institute of Environmental Health Sciences (NIEHS)
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)
Yes, NDA provides access to a webservice for the Data Dictionary: https://ndar.nih.gov/api/datadictionary/v2/
Once the preparatory steps in FAQ I-8 have been completed, data is submitted through a process of validating files against NDA harmonization standards and uploading them using an application provided by the NDA. Contact us with any questions that you may have after reviewing the detailed steps, which can be found in your cluster Share section.
Yes. The NDA Policy addresses informed consent concerns for both prospective studies and retrospective studies using existing data. In addition, the NIH has obtained a Certificate of Confidentiality and developed sample informed consent language. The NIH has also prepared a brochure for research participants. Although specific to the NDAR repository, this brochure may contain information useful for other research communities as well.
The "Authorized Institutional Business Official" who can approve the Data Submission Agreement or other NDA forms is someone at your organization who has been designated as having the "Signing Official" or SO role in the NIH eRA Commons. In general, a Signing Official is a person working in a research organization's business or executive office who has the authority to bind the institution in legal agreements. This may include members of your sponsored programs office, contract management office, or a variety of other types of departments. Contact us if you do not know who at your institution has this role.
The NIH eRA Commons is the electronic research administration system used by the NIH and grantee organizations. In order to establish NIH-recognized business officials, institutional executives will need to register the institution in eRA Commons. NIH personnel will verify the business officials (known in eRA Commons as the signing officials) designate and notify you when your organization has been registered with the system. Then you'll be ready to complete the NDA's certification forms and agreements.
As an agency of the Federal Government, the NIH is required to release government records in response to a request under the FOIA, unless they are exempt from release under one of the FOIA exemptions. Although NIH-held data is coded and does not hold direct identifiers to individuals within the NDA, the agency recognizes the personal and potentially sensitive nature of the genotype-phenotype data. Furthermore, the NIH takes the position that technologies available within the public domain today, and technological advances expected over the next few years, make the identification of specific individuals from raw genotype-phenotype data feasible and increasingly straightforward.
Therefore, the NIH believes that the release of unredacted NDA datasets in response to a FOIA request would constitute an unreasonable invasion of personal privacy under FOIA Exemption 6, 5 U.S.C. 552 (b)(6). Among the safeguards that the NIH foresees using to preserve the privacy of research participants and confidentiality of genetic data is the redaction of individual-level genotype, phenotype, and other data from disclosures made in response to FOIA requests as well as the denial of requests for unredacted datasets.
As stated in the NDA Policy, the NIH expects that all submissions to the NDA will include a certification by the responsible institutional business official(s) of the submitting institution that the expectations of the Policy have been met. For multi-site studies, is the primary submitting institution expected to certify data that are contributed by data collection centers at other institutions?
No. The submitting institution need not certify that the expectations of the Policy are met for data collected by other institutions within its multi-site arrangement. The NIH understands that the submitting institution is not necessarily the local institution or IRB of record for all data collected in a multi-site trial. However, the submitting institution should assure the NIH through the submission of the Data Submission Agreement that it believes, based on either its own review or assurance from other institutions, that the expectations of the Policy are met for the entire dataset. In obtaining assurance from other sites in a multi-site study, the submitting institution should retain copies of any information it receives from other data collecting sites.
No. Submitting institutions are expected to certify that data submission is consistent with applicable laws and regulation relevant to their specific activities (e.g., home state law, home institutional policies, etc). Submitting institutions may assume that all prior data transfers from data collection sites to the submitting institution (e.g., a data coordinating center) were conducted according to any applicable laws relevant to the those organizations at the time of the original data transfer. The NIH, however, does expect that all data were collected in accord with 45 C.F.R. Part 46. As discussed in a separate FAQ, this assurance in multi-site studies can be made on the basis of a direct review of study materials by the submitting institution or based on information or assurance provided to the submitting institution by data collecting organizations.
The NDA accepts clinical assessment and phenotypic data in both tab delimited and comma-separated value (CSV) formats. Imaging, genomics, and other rich data sources have more complex formats. For more information, refer to your Share section. Once data is defined, the submitting researcher will use the NDA Validation and Upload Tool to validate the data, creating a package for data submission. This tool requires version 8 or later of the Java runtime environment (JRE).
The NDA is able to accept imaging, and some genomic data, along with clinical assessment data and phenotypic data. Those submitting imaging and genomics should review the tools and steps need to properly harmonize these data, including the Experiment Definition Tool for functional imaging.
For imaging data, the NDA supports DICOM, NIFTI, AFNI, MINC 1.0, MINC 2.0, Analyze, SPM, GE, Siemens, and MIPAV formats.
For genomic or other -omic data, the NDA can accept only ASD and ASD-related omics data into NDAR. Other omics data must be made available through other databases. To share this data, the Experiment Definition Tool must also be used. This tool harmonizes the naming of data processing and analysis protocols, requires entering sufficient details, and enforces unambiguous interpretation of the entered information. In this way, the consistency of raw experimental data is guaranteed, while the process of definition is simplified for submission and aggregation across federated repositories.
The Data Dictionary is a collection of all measures, instruments, and assessments currently harmonized in the NDA through the definition of a standard data structure. See the full Data Dictionary for a list of these definitions, and click on one to enter its individual structure page and view defined elements. These can be referenced when formatting your data collection tools to streamline the submission process, or downloaded as blank templates to be filled in and uploaded directly to the NDA. Data cannot be uploaded until a corresponding data structure is defined in the Data Dictionary.
The NDA accepts assessments defined in the Data Dictionary and allows investigators to define their own data structures if they are not yet included in the dictionary. If you do not find a clinical assessment defined in the Data Dictionary, send us your codebook and our data curation staff will work with you to create an appropriate structure.
All data should be submitted cumulatively, with the exception of imaging and genomic data that remain unchanged. Clinical and phenotypic data should be provided cumulatively every six months. This allows for the performance of quality checks on the data over the course of the project. Every biannual submission cycles, previous datasets are archived and superseded by the most recent cumulative submission, simplifying versioning of the data.
The procedures for submitting data are the same for NIH intramural projects and extramural research projects.
The NDA is interested in all research data including data from control subjects, subjects with related co-morbid conditions as well as those subjects with the disease/disorder being studied. As a general rule, for NIH Grants, the NDA would expect data from the same number of subjects as those reported on the 2590 Inclusion Enrollment Report, including subjects eventually excluded from research analysis.
In general, the NDA is interested in receiving non-identifying human subjects research data. To help clarify contributed data, investigators can also upload/link relevant documentation describing the contributed data. In addition to project summaries, data collection methods and exclusion criteria can be made available. Such information is invaluable in helping other researchers understand the contributed data. While information characterizing the research is expected to be included, the decision on what should and should not be provided is best made by the contributing investigator. This information is to be uploaded to a Collection in the Supporting Documentation section.
For clinical trials, generally only those subjects who are randomized/enrolled into the trial should have data submitted.
The cost of data sharing should be included in your direct costs, and the hourly rate calculation based only on salary and fringe benefits. Indirect costs are taken into account separately by your institution.
There is a known bug with Java that may affect Linux users of this tool. When authenticating in order to upload, the tool may crash without feedback or an error message. This will not be addressed until the release of Java 10. Prior to this, a different system will need to be used when submitting data if you encounter this bug.
When an investigator is authorized to submit data, he or she is given control of an object called a Collection. Collections are virtual containers for data. They are created and organized on the basis of source grant, project, and laboratory, and the primary method of organizing data in the NDA. They provide tools useful in managing submission projects and allow investigators to define a wide array of other elements that provide context for the data, including all general information regarding the data and source project, experimental parameters used to collect any event-based data contained in the Collection, methods, and other supporting documentation. They also allow investigators to link underlying data to an NDA Study, defining populations and subpopulations specific to the aims of his or her research. Collections are assigned to one or more research clusters, or "permission groups" for the purpose of granting data access.
Descriptive data (demographics, summary subject measures, diagnostic measures), outcome measures establishing baselines and raw data (e.g. sequences, unprocessed neurosignal recordings and MRI data) are typically shared four months after submission. Subsequent submissions replace the data already shared, which is then moved into an archived state and made inaccessible. Within reason, an investigator has full control of when these data are shared. This is done by specifying the share date for the measure within the Data Expected tab of the project's Collection. Conflicts on timing are resolved by the NIH Program Officer or other funding agency representative.
Outcome measures and analyzed data should be shared when a publication is accepted. The NDA Study capability allows for sharing of only the precise subjects' data specific to a publication or finding (positive or negative). Using this capability, only the outcome measures and analyzed data supporting that finding are shared. Other data will remain embargoed until those data are published in a separate publication and associated NDA Study or the defined sharing dates occur, which is typically at the end of the project. If an NDA Study has not been defined on published data, the NIMH Data Archive staff will notify the investigator of the need to create an NDA Study allowing the rest of the project’s data to remain embargoed.
Once data are shared (moved from the default private state following submission into a shared state accessible by approved researchers), those data cannot be changed. Data are expected to be refreshed with a cumulative submission data according to the defined biannual submission schedule (January 15 and July 15). Previous versions of data are then moved into an archived state, allowing for analysis of changes from one submission to the next.
The NIH is implementing a two-tiered data control procedure for information and images submitted to the NDA. Such efforts help to ensure that the information submitted has undergone reviews for accuracy, completeness, and availability.
The first level of quality control is performed by the researcher who certifies the accuracy of the information prior to submission. Additionally, all data undergoes automated item-level validation using the Validation and Upload Tool. This tool checks the data proposed for submission against the Data Dictionary to ensure that the data comply with the standards established.
The second level of quality control occurs after data and/or images are submitted for broad research sharing. The NIH provides a period of time (typically four months) as specified in the Notice of Grant Award for the investigator, to undertake activities to review the completeness, accuracy and quality of the submission. Such efforts include verifying that the information received is complete (i.e., not missing records intended for submission), contains no identifying information, displays correctly, and that the NDA query toolset functions as expected with the information. During this timeframe, access to data and images for research is temporarily suspended to help ensure that only carefully reviewed information is made available. In the event that the NIH determine that additional time is necessary to ensure the quality of the submitted information (e.g., time necessary to remedy concerns), the NIH may opt to extend the quality control period as necessary in the interest of science. Automated quality control processes are also applied by the NDA during this period.
No. The NDA does not accept, store, or access biological samples; however, the NDA does expect to receive a reference to biological samples located elsewhere in appropriate submissions through the Research Subject summary data structure thereby enabling researchers to contact such biological repositories to gain access to such samples.
Can non-research entities (e.g., law enforcement agencies, insurance companies, employers, lawyers) request access to identifiable information corresponding to phenotype and genotype data held in the NDA?
The NIH acknowledges that legitimate requests made by law enforcement offices for access to data may be fulfilled. The NIH does not possess direct identifiers within the NDA, nor does the NIH have access to the link between the data code and the identifiable information that may reside with the primary investigators and institutions for particular studies. The release of identifiable information may be protected from compelled disclosure by the primary investigator's institution if a Certificate of Confidentiality is or was obtained for the original study (please see NIH Certificate of Confidentiality Policy). The NIH strongly encourages investigators to consider obtaining a Certificate of Confidentiality as an added measure of protection against future compelled disclosure of identities for studies planning to collect genome-wide association data.
No. The NIH strongly encourages, but does not require, institutions to obtain a Certificate of Confidentiality for eligible studies that plan to contribute research subject data to the NDA.
The NIH considers a Certificate of Confidentiality to be an important but not mandatory research tool, available to assist research organizations in protecting the privacy of their research participants. Congress designed these certificates as a way of encouraging potential research participants to participate in studies, given a common concern that public knowledge of their participation could lead to potentially negative consequences.
No. Only researchers with an approved Data Submission Agreement may submit data. If, however, the child's health care provider is also a researcher, the health care provider could request permission to submit data.
Yes, although data submitted to the NDA are de-identified such that the identities of research participants cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary users. Research participants should be made aware during the informed consent process that other researchers will have access to their coded research data and should also be informed about the existence of a risk to their privacy, if any. In fact, investigators approved to submit data to the NDA and their institutional officials certify that an IRB and/or Privacy Board have considered the risks to privacy prior to submitting the data. In addition, institutions and investigators granted access to shared data sign a Data Use Certification (DUC) agreeing to not make any attempt to re-identify individual participants, use the data only for approved purposes, and protect data confidentiality, among other stipulations.
The NIMH Data Archive (NDA) is set up to accept omics, imaging, and neurosignal recordings data and results related to a variety of research clusters. Raw or nearly raw data related to ASD received from research instrumentation (sequencers, MRIs, EEG headsets) are also expected and accepted by NDA. For those submitting -omics datasets for autism/autism-related studies to NDAR, we will ensure that the same data is registered and made available in the National Library of Medicine (NLM) supported systems (e.g., dbGaP, SRA). For qualified researchers who are interested in using raw and analyzed -omics data (e.g.FASTQ, BAM, or VCF), NDA will support access for just in time computation. However, these data are not to be persisted (i.e., stored) beyond the time necessary for computation. For those interested in downloading and storing raw -omics data, the source for those data will be NLM supported repositories (i.e. dbGaP and SRA).
The NIH will provide access to scientific investigators for research purposes. Qualified esearchers who have completed a Data Use Certification and received approval from the NDA Data Access Committee (DAC) may be approved to access broadly shared data. A separate request process exists for access to data in federated sources. Additionally, the DAC and support staff at NIH have access to NDA shared data.
Investigators requesting shared data must be affiliated with an NIH-recognized research institution (in the NIH's eRA Commons) that maintains an active FWA, and complete and submit a Data Use Certification approved by an authorized institutional business official with signature authority. This request will then be reviewed by the Data Access Committee (DAC).
To promote the responsible use of shared data, all institutions and investigators seeking access must commit to comply with NDA policies and procedures by signing a Data Use Certification (DUC). The Data Use Certification articulates that investigators will agree, among other things, to:
- Protect data confidentiality;
- Follow appropriate data security protections;
- Follow all applicable laws, regulations and local institutional policies and procedures for handling NDA data;
- Not attempt to identify individual participants from whom data within a dataset were obtained;
- Not sell or provide any of the data elements from datasets obtained from the NDA;
- Not share with individuals other than those listed on the DUC any of the data elements from datasets obtained from the NDA;
- Agree to the listing of a summary of their approved research uses within the NDA along with his or her name and organizational affiliation;
- Agree to report, in real time, violations of policy to the DAC;
- Comply with NDA policy with regard to reporting publications; and
- Provide annual progress reports on research using shared data including the creation of an NDA Study.
The Data Use Certification form is available on all NDA sites and when complete can be submitted via email to the NDA.
Yes. When data are submitted, the researcher provides information related to the data, including his or her name, an overview of the data and how it was acquired, and other pertinent information associated with the research. This information is made available to other researchers, in accordance with the NDA Policy and procedures.
The NDA currently supports an interface with recent versions of Microsoft Internet Explorer, Safari, Chrome, and Firefox. In addition, a current version of the Java Runtime Environment (JRE) is required to download data and use all NDA tools.
The NDA accepts data from researchers regardless of funding source.
Once approved, researchers will be granted access to the entire collection of shared data in the NDA research clusters requested. This includes access to data contributed by NIH-funded extramural researchers and intramural researchers, as well as research institutions or organizations that are not necessarily funded by NIH.
Note that federated data resources may define different data access requirements. Additionally, data specified under the "Ongoing Study" capability are excluded from this provision.
The NDA Data Access Committee (DAC) will evaluate requests from researchers seeking access to shared datasets by reviewing the signed and completed Data Use Certification. The review will determine whether the proposed use of the dataset is ethically appropriate and does not pose a risk to subject confidentiality. The scientific merit of the proposed research question or the availability of data are not evaluated. The NDA DAC will either approve access, deny access, or request clarification based on these criteria.
No, there are no fees to access data stored within the NDA, which is a resource funded by the NIH.
Consult your Program Officer for funding opportunities.
Investigators interested in using shared data for prospective studies funded by the NIH are encouraged to request access to the NDA. See FAQ III-2 for information on when data are shared following submission.
Data sharing prior to the completion of a study is possible through the "Ongoing Study" option defined by Standard Operating Procedure #9, Request for Ongoing Study (SOP-9).
Submitters are in the best position to determine whether new uses of data are appropriate during the time that a study is ongoing, and collaboration requires communication between the Submitter and the potential Recipient. Therefore, the NIH wants Recipients seeking access to data from an ongoing study to coordinate with the Submitters. This coordination may result in a decision to collaborate, a decision to authorize access-absent collaboration, or a decision to delay access until a study is complete. The NIH believes, at this time, that a consultation process is an essential means to expand access to data in a timely fashion without compromising the valuable contribution and trust of Submitters.
Please see FAQ III-2 for more information.
The NIH recognizes that an efficient method to identify a research subject without the need to distribute personally identifiable information (PII) is of value to the research community. Therefore, the NDA will support researchers interested in using the GUID for this purpose, even if the researcher has no immediate plans to share data with the NDA or a federated data resource.
The NDA is not funded to support all research communities; however, a research community interested in using the NDA framework should contact us to discuss the possibility of collaboration and support.
The NIMH Data Archive will provide a Digital Object Identifier (DOI) and web address for submitted data that has been associated with a specific publication (including publications that are still being drafted). The DOI can be issued in an inactive state allowing for it to be activated later (i.e., when an article is published). After data have been deposited in the NIMH Data Archive, you will need to create a Study through the Data from Papers interface (see tutorial). Once you have defined a Study you will be provided with text for how the data should be acknowledged in your publication (See FAQ VI-4 for how to cite data). Your Study, including associated data, will remain private until your study is shared; the Study will be shared no later than the time of publication. If you have any questions about how to submit data, or how to create a Study please contact firstname.lastname@example.org.
The NIMH Data Archive Data Submission Agreement specifies how data should be cited. If the publication is based on datasets that were not submitted by the publication author(s), submitters of these datasets should also be acknowledged. You should include a NIMH Data Archive DOI in any publication (see FAQ VI-3 for how to obtain a DOI), and you should reference it as a URL (i.e., http://dx.doi.org/DOI). This provides readers with a direct link to data in the Archive that were used in your publication.
The NIH is keenly interested in collaborating with research communities to help accelerate scientific discovery. The NDA has solicited input from various research communities in a array of meetings, webinars, and formal workshops. Feel free to contact us to learn more about the NDA, provide feedback, and/or learn how to become more involved.
Data federation aggregates data from different and often physically distributed systems that exist outside of a central system. The NDA can tap into such data sources and provide simple, efficient access to research data that reside outside of NDA-controlled repositories. Research data exists in many such databases located around the world, but differing access procedures and inconsistent data standards/formats often combine to create insurmountable barriers to data reuse and collaboration. Rather than moving data from various sources into the NDA which can invoke additional complications, the NDA has deployed data "federation" technology allowing researchers to access important research data directly in a simple, consistent manner through the NDA platform. In doing so, researchers use a uniform interface to perform a single query to search and return data from multiple interconnected sources in a seamless fashion. Today, the NDA connects to the Interactive Autism Network, the Autism Genetic Resource Exchange, the Autism Tissue Program, and the Ontario Brain Institute's Brain-CODE system.
The GUID, or "Global Unique Identifier", allows the NDA to associate a single research participant's genomic, imaging, clinical assessment and other information even if the data were collected at different locations or through different studies. The GUID is an alphanumeric code created indirectly from personally identifiable information such as a subject's name and date of birth; however, this information is never transmitted to or stored in any NDA database. The GUID is then used as the primary participant identifier in data submissions. This ability to aggregate an individual's data while maintaining their confidentiality is core to the benefits provided by sharing research data.
For this reason, when collecting a research subject's personally identifiable information (PII) at a research site, be sure to obtain all necessary information as it appears on the subject's birth certificate. This ensures the information used by researchers to create the GUID does not change over an individual's life. Researchers who anticipate submitting research data should design their information collection procedures prior to subject enrollment to ensure the collection of all information needed to generate a GUID. In addition, researchers who are not expected to submit data are encouraged to collect this information from subjects in the event that research data may be submitted in the future. In cases where all subject information needed to generate a GUID is not available, a random identifier or "pseudo-GUID" can be generated and assigned to a participant; however, valid GUIDs should ideally be used for all subjects to increase the linkage of research data now and in the future.
The Director of the National Institute of Mental Health (NIMH) oversees the NDA, its policy, and implementation. In carrying out this responsibility, the NIMH Director participates on a Governing Committee, with several other NIH Institute and Center Directors, or their designees that fund the NDA. The Governing Committee is responsible for the on-going management and stewardship of NDA Policy and procedures. Reporting to the Governing Committee are several groups and teams charged with the implementation, communication, and development of specific procedures related to the conduct, submission, and data release practices for NDA. One of these groups, the NDA Implementation Team, is responsible for overseeing Policy and data access to promote consistent and robust participant protections.
In consideration of the evolving scientific, ethical, and societal issues related to the NDA, the Governing Committee:
- Ensures ongoing, high-level agency oversight;
- Obtains regular input from public representatives, including those with expertise in bioethics, privacy, data security, and appropriate scientific and clinical disciplines; and
- Revisits and revises the NDA Policy as appropriate.
The NIH established a Data Access Committee (DAC) to oversee access to the data contained in NDA. Investigators and institutions seeking access to shared data from the NDA must complete and submit a Data Use Certification signed by the investigator. The document must be co-signed by an NIH-recognized business official at the investigator's affiliated institution that has Federalwide Assurance (FWA). Access to data in federated data sources are governed through the Data Access Committee and the procedures established by that federated data source; however, researchers may request access to a federated data source through their NDA user account.
Membership of the NDA Data Access Committee (DAC) includes federal staff with expertise in areas such as relevant scientific disciplines, research participant protection, and privacy protections. The NDA DAC approves researcher access to NDA shared resources.
The DAC approves access to data and/or images for research purposes only. The DAC will review the Data Use Certification of each investigator requesting data and provide access based on the expectations outlined in the NDA Policy. These expectations include the protection of data privacy, confidentiality, and security. In the event that requests raise concerns related to privacy and confidentiality, risks to populations or groups, or other concerns, the DAC will consult with other experts as appropriate.
To ensure the security of the data held by the NDA, the NIH Center for Information Technology employs multiple tiers of data security based on the content and level of risk associated with the data.
Datasets stored in the NDA are under strict security provisions, including but not limited to multiple firewalls, separate servers, and data encryption protocols. As a federal information system, the NDA follows the recommended security controls defined by the National Institutes of Standards 800-53r1 and related publications. The NDA undergoes an annual independent certification and assurance audit specific to the controls defined in 800-53r1 ensuring that the defined management, data recovery, procedural, and technical controls are followed. Additionally, all NDA policies are reviewed on an annual basis.
As detailed in the NDA Policy, investigators and their sponsoring institutions seeking access to shared data in the NDA must submit a data access request (Data Use Certification document )that specifies both the data (research cluster) to which access is sought and the planned research use. In addition, the investigator must agree to the terms of access set forth in the Data Use Certification. The NIH has established a Data Access Committee (DAC) to oversee and approve access to the NDA.