The National Institutes of Health's new Policy for Data Management and Sharing (DMS Policy), which went into effect January 25, 2023, requires NIH-funded researchers to submit a plan outlining how scientific data from their research will be managed and shared within their funding application. The policy includes an expectation that researchers will maximize their data sharing within ethical, legal, or technical constraints, and explicitly encourages researchers to incorporate data sharing via deposit into a public repository into their standard research process.
To help University of Pittsburgh researchers comply with the new policy, HSLS Data Services has put together this page of guidance for the new policy along with best practices for data management and data sharing. Each section in this page is organized into three sections:
If you have a question, desire one-on-one consultation, or would like us to review your data management and sharing (DMS) plan, please don't hesitate to contact HSLS Data Services.
If you are in a hurry and only read one thing on this page, here is our #1 recommendation for writing the Data Management and Sharing Plan: use DMPTool.org and sign in with your Pitt email address to get a customized template and guidance for writing a detailed NIH DMS Plan. When you request feedback on a draft through DMPTool, HSLS librarians can offer specific suggestions.
In 2020, the National Institutes of Health released a Policy for Data Management and Sharing (DMS Policy) (also called the DMSP) that updated the institutes' data sharing policy from 2003. After years of public comment and revision, the final policy went into effect on January 25, 2023.
The policy requires three things: that researchers think about how they will manage, document, and share their scientific research data before beginning data collection; that they show the NIH their thought process in a formal Data Management and Sharing Plan in their funding application (with an accompanying budget detailing data management costs), and that they make their research data as publicly available as possible within a reasonable time frame.
All researchers who are funded in whole or in part by the NIH must comply with the policy, regardless of funding level. However, the policy only applies to activities that generate scientific research data according to a specific definition given below. If you are funded through a Training (T), Fellowship (F), Construction (C06), Conference (R13), Resource (G), or Research-Related Infrastructure Program (e.g., S06) grant, the policy will not apply to your work. See the complete list of NIH activity codes for more information.
The DMS Policy applies to scientific research data, defined as "data commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications." This covers both quantitative and qualitative data, and applies to all data types including image, audio, and video data of all file formats and sizes. It explicitly does not apply to "laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects such as laboratory specimens" (source: Research Covered Under the Data Management & Sharing Policy ), as those may contain or produce data but do not constitute scientific data themselves.
Determining which data fall under the policy will depend on your own judgment and the norms of your field. If you generate a lot of calibration, null, duplicate, or "junk" data and would not normally consider that as necessary to replicate findings, you do not need to address that data in your DMS plan. Conversely, you may consider some research material to be "data" that is not explicitly addressed in the policy. The policy does not address scripts, code, algorithms, or computational models as data products in and of themselves, but if you consider those to be necessary for replication of your work, you should discuss them in your DMS plan in order to make your management and sharing easier. (The policy does explicitly ask researchers to describe any software tools necessary to access or manipulate the data.)
The ultimate goal of the DMS policy is to encourage researchers to think about data management and data sharing in concrete terms before executing their research plan, because experience has shown that it's much harder to do after the conclusion of a project. We recommend erring on the side of addressing too much rather than too little.
The 2003 NIH Data Sharing Policy only applied to investigators seeking single-year direct costs of $500,000 or more, and it required researchers to either outline their plans for sharing their data or to explain why they could not. In contrast, the new policy requires all investigators seeking any level of funding to submit a plan, and it asks researchers to outline their plans for organizing, storing, and documenting their data (AKA data management) as well as for sharing it. While the new policy acknowledges that legitimate technical, legal, and privacy-related barriers to complete data sharing still exist, it more explicitly urges researchers to share their data promptly in order to increase the overall reproducibility of science.
NIH program staff will evaluate the DMS plans and recommend them for approval or denial. The plans will not be shared with peer reviewers and will not be factored into a submission's overall score; however, the budget including costs for data management will be shared with peer reviewers. Once approved, the plan will become a Term and Condition of the award and compliance will be evaluated through annual progress reports. If an investigator does not follow the approved plan, it may affect renewal of the grant and/or future funding decisions. However, the plan can be modified if necessary by working with NIH program officers.
The University of Pittsburgh has stipulated that the Principal Investigator must oversee the management and sharing of data during the study process for his or her project, and that the University of Pittsburgh will require the Principal Investigator to certify at the times of annual progress report and final report that the NIH-approved data sharing and management plan(s) has/have been followed. The University of Pittsburgh Office of Research Protections will periodically audit the NIH-approved data sharing and management plans for adherence.
January 25, 2023 is the date on or after which all NIH applicants must plan and budget for managing and sharing data, submit a data management and sharing (DMS) plan when applying for funding, and comply with any approved DMS plan. The steps below outline a general workflow for preparing for and complying with the new policy. Since articulating a DMS plan can be a lengthy process, we recommend starting as early as possible.
After submission, during review:
The Data Management and Sharing (DMS) plan should be a one- to two-page document (longer if necessary) submitted with the general funding application as a PDF attachment. (Longer documents are allowed if required.) If a researcher's application is also subject to the Genomic Data Sharing Policy, they should address GDS-specific topics within this general DMS plan and will no longer submit a separate GDS plan; see the call-out boxes in the NIH's Writing a Data Management & Sharing Plan page for more information on incorporating GDS.
The topics to be included in a DMS Plan are laid out in the NIH's Supplemental Policy Information: Elements of an NIH Data Management and Sharing Plan. Briefly, the elements are:
In general, more specificity is better because a comprehensive plan requires less spur-of-the-moment decision-making later. This is meant to be a helpful tool for researchers to organize and share their data, not merely a hoop to jump through! Some institutes or centers may have more specific requirements outlined in this list of NIH Institute and Center Data Sharing Policies.
While the NIH suggests that most plans should be two pages or less, they will accept longer plans. Complex proposals with multiple data types, study sites, or research activities may benefit from using a table format to cut down wordiness in addressing each element of the plan.
If you are not sure about an element of the Plan yet because it will depend on emergent needs during the course of the study, you can address this uncertainty within the relevant section. For example, if you are developing computational tools to work with your data, you can reference these without specifically naming software that doesn't yet exist. However, you should demonstrate that you are proactively considering these questions, especially when it comes to sharing your data through a repository or other data-sharing platform. If you specify something in your Plan that you want to change later, you may do so with approval of NIH staff.
The NIH has developed an optional Data Management and Sharing Plan Format Page that aligns with the elements given above. The form is available as a download, and instructions for filling it out are included in the NIH Application Guide.
An excellent way to create a DMS Plan is by using the templates available through DMPTool, a service that allows you to send draft DMS plans to HSLS librarians feedback before official submission. The DMPTool team has updated their NIH template to reflect the requirements of the new policy, and will keep it up to date as new guidance is announced. For one-on-one help, contact HSLS Data Services.
Managing your data and making it publicly available in an easy-to-use format with clear documentation costs money, both real and as time spent. The NIH allows investigators to request money for data management and sharing in the budget and budget justification sections of their applications. Specific information is included in the NIH's Supplemental Policy Information: Allowable Costs for Data Management and Sharing.
In general, reasonable costs for the following are allowable:
The following costs are not allowable:
All costs that are included in the budget must be incurred during the performance period regardless of how long the data will remain available. Budgets, but not the associated Data Management & Sharing Plans, will be shared with peer reviewers for assessment of their reasonableness. The NIH has not announced a separate source of money specifically for data management and sharing costs, and these costs should be treated as a part of the overall budget proposal.
The publications within "Forecasting Costs for Preserving, Archiving, and Promoting Access to Biomedical Data," a project of the National Academies, may be useful when determining costs. COGR's NIH Data Management and Sharing Guide also has a section on allowable costs: COGR Readiness Guide Chapter 4 - Budgeting and Costing. However, we highly recommend contacting HSLS Data Services for help anticipating the costs of storing or sharing data in a repository. Many data repositories are free, but some have storage caps (either per-file or overall) that require data processing charges not otherwise advertised. We can help you determine the most appropriate home for your data before you name it in your DMS Plan and request funds in your budget.
The NIH strongly recommends that researchers deposit their data into a public data repository for long-term storage and access. If access to the data needs to be restricted, controlled-access repositories or other data-sharing platforms are available, but sharing data via email by request or hosting it on a lab server will not meet the policy's requirements in most cases.
A data repository is a platform for hosting research datasets that enables them to be findable, accessible, interoperable, and reusable by researchers and the public all around the world. (See the FAIR Data Principles, which were developed to optimize the reuse of scientific research data). There are many reasons to use a data repository instead of a lab website, FTP server, or cloud storage like Google Drive:
There are many, many repositories, some of which take all kinds of research material (publications, data, posters, etc.) and some of which specialize in a content type or discipline. There isn’t one “best” repository, but rather one “best suited for your particular data.”
Here are some questions to lead you to a possible repository match:
It can be difficult to choose among the large generalist repositories, although all of the options on the NIH list meet the specifications of the National Science and Technology Council's "Desirable Characteristics of Data Repositories for Federally Funded Research" report. This Generalist Repository Comparison Chart provides a quick reference for the repositories' costs, storage caps, and limitations, although it may not include all repositories currently available.
Most repositories will ask you to apply a license to your uploaded data so that users know what they may and may not do with your work. Since the purpose of sharing data is to facilitate data reuse and increase reproducibility, some repositories specifically require you to apply a Creative Commons 0 “No Rights Reserved” license. This means that other people may download, re-analyze, re-share, and otherwise re-use your data, but it does not exempt them from the standard expectations of citation and giving academic credit.
Note that choosing not to apply a license is the same thing, legally speaking, as stating “all rights reserved.” In the strictest interpretation, this means that a user wouldn’t even be allowed to email a copy of your dataset to themselves because it might be unauthorized copying! However, people may interpret a lack of license as meaning that they can do anything with your data. To prevent confusion, it is best to choose an appropriate license and clearly attach it to your files as a text file bundled with a data download or as text in a metadata record.
Software and code, including research analysis scripts, also benefit from clear licenses. Github has a simple, interactive tool that suggests and explains appropriate code licenses at ChooseALicense.com.
Q: If I submit my data to a repository, can I remove it later?
A: In general, no. If you make a mistake or want to re-upload a new version, you can resubmit your files, but the old files will remain visible as a previous version in order to preserve the scientific record. If you encounter legal or confidentiality issues, you can request that your files be withdrawn from the repository, but usually a metadata-only record will remain that describes the files (in terms of author, title, etc.) that used to be there.
Q: When I submit my data to a repository, am I giving up any rights?
A: In general, no. Anyone who uses your data that they found in a repository should acknowledge/cite/credit you appropriately. (Many repositories actually make that easy by issuing DOIs to all data submissions, which are frequently required to cite datasets.) In some cases, repositories do require you to license your submitted data with a Creative Commons Zero (CC0) license, essentially putting it in the public domain. This helps enable replication and reuse, and data is generally not protected by copyright anyway. A CC0 license does not exempt anyone from the normal expectations of scholarly credit as mentioned above.
Q: My research reuses shared data to generate new results. Do I need to re-share the primary data again?
A: No, you do not need to re-share the primary data you used in your analysis, but you do need to cite it appropriately. Any new scientific data that you generate from your research is subject to the policy and should be shared as publicly as possible. Sharing the new data in a repository that generates a DOI (digital object identifier, essentially a permanent web link) will let other researchers cite and reuse your data in the same way.
Q: My data files are really big. Will a repository accept my data for deposit?
A: Each repository has different file size limits, both per-file and per-user. Dryad is one of the biggest, accepting up to 300GB, but they charge additional storage fees over 50GB. Zenodo accepts up to 50GB per deposit, with unlimited storage per researcher. If you have truly massive datasets (especially common with images or video), contact HSLS Data Services and we will help you find a solution.
Q: My data includes protected health information. Can I share/do I still have to share it?
A: Yes, you can share it (and the NIH expects you to) if you can reasonably de-identify everything in your data and you have informed consent to do so. Most repositories state that they do not accept personally identifiable health information, and that by uploading your data you are certifying that it is appropriately scrubbed. In some cases, you may still want to restrict access to your data only to qualified or vetted researchers. Contact HSLS Data Services and we can help you navigate next steps.
Q: My data is managed by a Data Use Agreement. Who do I contact to find out more about how this works with the NIH policy?
A: Data Use Agreements (DUAs) are handled by the Pitt Office of Sponsored Programs (OSP). See their website for more information or contact them for assistance.
Q: I am concerned about whether data sharing is allowed by the informed consent forms my research participants signed. How do I find out more about what I am allowed to share?
A: Contact the IRB (Institutional Review Board) help line through the Pitt Human Research Protection Office (HRPO). They can help you understand what is and is not allowed in your current informed consent forms, and write language that aligns with the new NIH requirements for your future forms. See also the NIH "Considerations for Obtaining Informed Consent" and the NIH Office of Science Policy's "Informed Consent for Secondary Research with Data and Biospecimens" documents.
A: I have made a sincere effort to share my data in accordance with the policy, but there simply isn't a solution that fits all of my requirements (due to size, privacy requirements, etc.) Is there any way I can share my data upon request with vetted researchers without going through a repository, and will that satisfy the NIH requirements?
A: The NIH understands that there will be cases in which data can't be completely shared. In that situation, it's important to demonstrate in your DMS Plan that you have considered all of the available options and still have specific constraints or concerns. If you would like to publicly describe your data and invite qualified researchers to apply for full or increased data access, we can help you create a metadata-only record in a data catalog like the Pitt Data Catalog. This strategy can also help you increase the findability of large datasets hosted on a cloud storage service. Contact HSLS Data Services for more information.
Have a question not answered here? The NIH also maintains its own FAQ on the data management and sharing policy with regular updates.