The National Institutes of Health's new Policy for Data Management and Sharing (DMS Policy), which goes into effect January 25, 2023, will require NIH-funded researchers to submit a plan outlining how scientific data from their research will be managed and shared within their funding application. The policy includes an expectation that researchers will maximize their data sharing within ethical, legal, or technical constraints, and explicitly encourages researchers to incorporate data sharing via deposit into a public repository into their standard research process.
To help University of Pittsburgh researchers comply with the new policy, HSLS Data Services has put together this page of guidance for the new policy along with best practices for data management and data sharing. Each section in this page is organized into three sections:
In addition, this entire libguide can be downloaded as a PDF handout: NIH Data Management & Sharing Policy (Effective 2023): A Guide for Pitt Health Sciences Researchers. You can view the playlist of all videos here: [forthcoming]
If you have a question, desire one-on-one consultation, or would like us to review your data management and sharing (DMS) plan, please don't hesitate to contact HSLS Data Services.
The following multimedia are available for this topic:
In 2020, the National Institutes of Health released a Policy for Data Management and Sharing (DMS Policy) (also called the DMSP) that updated the institutes' data sharing policy from 2003. After years of public comment and revision, the final policy will go into effect on January 25, 2023.
The policy requires three things: that researchers think about how they will manage, document, and share their scientific research data before beginning data collection; that they show the NIH their thought process in a formal Data Management and Sharing Plan in their funding application (with an accompanying budget detailing data management costs), and that they make their research data as publicly available as possible within a reasonable time frame.
All researchers who are funded in whole or in part by the NIH must comply with the policy, regardless of funding level. However, the policy only applies to activities that generate scientific research data according to a specific definition given below. If you are funded through a Training (T), Fellowship (F), Construction (C06), Conference (R13), Resource (G), or Research-Related Infrastructure Program (e.g., S06) grant, the policy will not apply to your work. See the complete list of NIH activity codes for more information.
The DMS Policy applies to scientific research data, defined as "data commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications." This covers both quantitative and qualitative data, and applies to all data types including image, audio, and video data of all file formats and sizes. It explicitly does not apply to "laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects such as laboratory specimens" (source: Research Covered Under the Data Management & Sharing Policy ), as those may contain or produce data but do not constitute scientific data themselves.
Determining which data fall under the policy will depend on your own judgment and the norms of your field. If you generate a lot of calibration, null, duplicate, or "junk" data and would not normally consider that as necessary to replicate findings, you do not need to address that data in your DMS plan. Conversely, you may consider some research material to be "data" that is not explicitly addressed in the policy. The policy does not address scripts, code, algorithms, or computational models as data products in and of themselves, but if you consider those to be necessary for replication of your work, you should discuss them in your DMS plan in order to make your management and sharing easier. (The policy does explicitly ask researchers to describe any software tools necessary to access or manipulate the data.)
The ultimate goal of the DMS policy is to encourage researchers to think about data management and data sharing in concrete terms before executing their research plan, because experience has shown that it's much harder to do after the conclusion of a project. We recommend erring on the side of addressing too much rather than too little.
The 2003 NIH Data Sharing Policy only applied to investigators seeking single-year direct costs of $500,000 or more, and it required researchers to either outline their plans for sharing their data or to explain why they could not. In contrast, the new policy requires all investigators seeking any level of funding to submit a plan, and it asks researchers to outline their plans for organizing, storing, and documenting their data (AKA data management) as well as for sharing it. While the new policy acknowledges that legitimate technical, legal, and privacy-related barriers to complete data sharing still exist, it more explicitly urges researchers to share their data promptly in order to increase the overall reproducibility of science.
NIH program staff will evaluate the DMS plans and recommend them for approval or denial. The plans will not be shared with peer reviewers and will not be factored into a submission's overall score; however, the budget including costs for data management will be shared with peer reviewers. Once approved, the plan will become a Term and Condition of the award and compliance will be evaluated through annual progress reports. If an investigator does not follow the approved plan, it may affect renewal of the grant and/or future funding decisions. However, the plan can be modified if necessary by working with NIH program officers.
On a local level, ensuring compliance with the plan is left to the investigator. The University of Pittsburgh has not established a person or team who is responsible for making sure researchers comply with their approved plans. Contact HSLS Data Services to discuss best practices and workflows for setting research teams up for success.
The following multimedia are available for this topic:
January 25, 2023 is the date on or after which all NIH applicants must plan and budget for managing and sharing data, submit a data management and sharing (DMS) plan when applying for funding, and comply with any approved DMS plan. The steps below outline a general workflow for preparing for and complying with the new policy. Since articulating a DMS plan can be a lengthy process, we recommend starting as early as possible.
Before submission:
After submission, during review:
After approval:
The following multimedia are available for this topic:
The Data Management and Sharing (DMS) plan should be a one- to two-page document submitted with the general funding application as a PDF attachment. If a researcher's application is also subject to the Genomic Data Sharing Policy, they should address GDS-specific topics within this general DMS plan and will no longer submit a separate GDS plan; see the call-out boxes in the NIH's Writing a Data Management & Sharing Plan page for more information on incorporating GDS.
The topics to be included in a DMS Plan are laid out in the NIH's Supplemental Policy Information: Elements of an NIH Data Management and Sharing Plan. Briefly, the elements are:
In general, more specificity is better because a comprehensive plan requires less spur-of-the-moment decision-making later. This is meant to be a helpful tool for researchers to organize and share their data, not merely a hoop to jump through! That being said, DMS Plans should be under two pages. This rule is true for all NIH applications, but some institutes or centers may have more specific requirements outlined in this list of NIH Institute and Center Data Sharing Policies.
If you are not sure about an element of the Plan yet because it will depend on emergent needs during the course of the study, you can address this uncertainty within the relevant section. For example, if you are developing computational tools to work with your data, you can reference these without specifically naming software that doesn't yet exist. However, you should demonstrate that you are proactively considering these questions, especially when it comes to sharing your data through a repository or other data-sharing platform. If you specify something in your Plan that you want to change later, you may do so with approval of NIH staff.
The NIH is developing a format page that aligns with the elements given above. Currently (as of 2022-10-09), a preview version of the format page is available, with a final fillable format expected later this fall.
The best way to create a DMS Plan is by using the templates available through DMPTool, a service that allows you to submit draft DMS plans to HSLS librarians feedback before official submission. The DMPTool team has updated their NIH template to reflect the requirements of the new policy, and will keep it up to date as new guidance is announced. For one-on-one help, contact HSLS Data Services.
The following multimedia are available for this topic:
Managing your data and making it publicly available in an easy-to-use format with clear documentation costs money, both real and as time spent. The NIH allows investigators to request money for data management and sharing in the budget and budget justification sections of their applications. Specific information is included in the NIH's Supplemental Policy Information: Allowable Costs for Data Management and Sharing.
In general, reasonable costs for the following are allowable:
The following costs are not allowable:
All costs that are included in the budget must be incurred during the performance period regardless of how long the data will remain available. Budgets, but not the associated Data Management & Sharing Plans, will be shared with peer reviewers for assessment of their reasonableness. The NIH has not announced a separate source of money specifically for data management and sharing costs, and these costs should be treated as a part of the overall budget proposal.
The publications within "Forecasting Costs for Preserving, Archiving, and Promoting Access to Biomedical Data," a project of the National Academies, may be useful when determining costs. However, we highly recommend contacting HSLS Data Services for help anticipating the costs of storing or sharing data in a repository. Many data repositories are free, but some have storage caps (either per-file or overall) that require data processing charges not otherwise advertised. We can help you determine the most appropriate home for your data before you name it in your DMS Plan and request funds in your budget.
The following multimedia are available for this topic:
The NIH strongly recommends that researchers deposit their data into a public data repository for long-term storage and access. If access to the data needs to be restricted, controlled-access repositories or other data-sharing platforms are available, but sharing data via email by request or hosting it on a lab server will not meet the policy's requirements in most cases.
A data repository is a platform for hosting research datasets that enables them to be findable, accessible, interoperable, and reusable by researchers and the public all around the world. (See the FAIR Data Principles, which were developed to optimize the reuse of scientific research data). There are many reasons to use a data repository instead of a lab website, FTP server, or cloud storage like Google Drive:
There are many, many repositories, some of which take all kinds of research material (publications, data, posters, etc.) and some of which specialize in a content type or discipline. There isn’t one “best” repository, but rather one “best suited for your particular data.”
Here are some questions to lead you to a possible repository match:
It can be difficult to choose among the large generalist repositories, although all of the options on the NIH list meet the agency's Desirable Characteristics for All Data Repositories checklist. This Generalist Repository Comparison Chart provides a quick reference for the repositories' costs, storage caps, and limitations, although it may not include all repositories currently available.
Most repositories will ask you to apply a license to your uploaded data so that users know what they may and may not do with your work. Since the purpose of sharing data is to facilitate data reuse and increase reproducibility, some repositories specifically require you to apply a Creative Commons 0 “No Rights Reserved” license. This means that other people may download, re-analyze, re-share, and otherwise re-use your data, but it does not exempt them from the standard expectations of citation and giving academic credit.
Note that choosing not to apply a license is the same thing, legally speaking, as stating “all rights reserved.” In the strictest interpretation, this means that a user wouldn’t even be allowed to email a copy of your dataset to themselves because it might be unauthorized copying! However, people may interpret a lack of license as meaning that they can do anything with your data. To prevent confusion, it is best to choose an appropriate license and clearly attach it to your files as a text file bundled with a data download or as text in a metadata record.
Software and code, including research analysis scripts, also benefit from clear licenses. Github has a simple, interactive tool that suggests and explains appropriate code licenses at ChooseALicense.com.
The following multimedia are available for this topic:
Q: If I submit my data to a repository, can I remove it later?
A: In general, no. If you make a mistake or want to re-upload a new version, you can resubmit your files, but the old files will remain visible as a previous version in order to preserve the scientific record. If you encounter legal or confidentiality issues, you can request that your files be withdrawn from the repository, but usually a metadata-only record will remain that describes the files (in terms of author, title, etc.) that used to be there.
Q: When I submit my data to a repository, am I giving up any rights?
A: In general, no. Anyone who uses your data that they found in a repository should acknowledge/cite/credit you appropriately. (Many repositories actually make that easy by issuing DOIs to all data submissions, which are frequently required to cite datasets.) In some cases, repositories do require you to license your submitted data with a Creative Commons Zero (CC0) license, essentially putting it in the public domain. This helps enable replication and reuse, and data is generally not protected by copyright anyway. A CC0 license does not exempt anyone from the normal expectations of scholarly credit as mentioned above.
Q: My data files are really big. Will a repository accept my data for deposit?
A: Each repository has different file size limits, both per-file and per-user. Dryad is one of the biggest, accepting up to 300GB, but they charge additional storage fees over 50GB. If you have truly massive datasets (especially common with images or video), contact HSLS Data Services and we will help you find a solution.
Q: My data includes protected health information. Can I share/do I still have to share it?
A: Yes, you can share it (and the NIH expects you to) if you can reasonably de-identify everything in your data and you have informed consent to do so. Most repositories state that they do not accept personally identifiable health information, and that by uploading your data you are certifying that it is appropriately scrubbed. In some cases, you may still want to restrict access to your data only to qualified or vetted researchers. Contact HSLS Data Services and we can help you navigate next steps.
Q: My data is managed by a Data Use Agreement. Who do I contact to find out more about how this works with the NIH policy?
A: Data Use Agreements (DUAs) are handled by the Pitt Office of Sponsored Programs (OSP). See their website for more information or contact them for assistance.
Q: I am concerned about whether data sharing is allowed by the informed consent forms my research participants signed. How do I find out more about what I am allowed to share?
A: Contact the IRB (Institutional Review Board) help line through the Pitt Human Research Protection Office (HRPO). They can help you understand what is and is not allowed in your current informed consent forms, and write language that aligns with the new NIH requirements for your future forms.
A: I have made a sincere effort to share my data in accordance with the policy, but there simply isn't a solution that fits all of my requirements (due to size, privacy requirements, etc.) Is there any way I can share my data upon request with vetted researchers without going through a repository, and will that satisfy the NIH requirements?
A: The NIH understands that there will be cases in which data can't be completely shared. In that situation, it's important to demonstrate in your DMS Plan that you have considered all of the available options and still have specific constraints or concerns. If you would like to publicly describe your data and invite qualified researchers to apply for full or increased data access, we can help you create a metadata-only record in a data catalog like the Pitt Data Catalog. This strategy can also help you increase the findability of large datasets hosted on a cloud storage service. Contact HSLS Data Services for more information.
The following multimedia are available for this topic: