Many of the public use data sets often include documentation, coding information (labels, for example), etc. Don't ignore those files!
Additionally, you may need to complete special permission forms to access some data sets and/or pay a fee.
It is your responsibility to determine the requirements for the use of any of the data sets listed on this page.
Be sure to cite all datasets used in any manuscript you create. Additionally, consider using these reporting guidelines to ensure your manuscript is complete and transparent.
Observational study reporting guidelines
Produced by: US Centers for Disease Control & Prevention (CDC)
Dates of coverage: varies by dataset
Geographic coverage: varies by dataset
This site provides links to almost 700 datasets from the CDC as well as from other Federal agencies. Many, but not all, of these datasets are accessible to the public.
Produced by: US Agency for Toxic Substances and Disease Registry (ATSDR) and Centers for Disease Control & Prevention (CDC)
Dates of coverage: varies by data set
Geographic coverage: Census Tracts and County level
Years: 2000, 2010, 2014, 2016, 2018
Choose by Year, Geography, and Geography Type to download data.
Produced by: U.S. Centers for Medicare & Medicaid Services
Dates of coverage: varies
Geographic coverage: County, State, and National level
Individual files:
Produced by: National Collaborative on Childhood Obesity Research
Dates of coverage: varies
Geographic coverage: County, State, and National level
"The Catalogue of Surveillance Systems provides one-stop access to over 100 publicly available datasets relevant to childhood obesity research.
Datasets profiled in the Catalogue include information on obesity-related:
Surveillance systems included in the Catalogue were identified by reviewing existing reports and soliciting expert input. The systems were chosen because they provide access to publicly available raw data gathered in the United States and were released in the past 10 years."
Produced by: US Census Bureau
Dates of coverage: varies
Geographic coverage: varies by dataset
Individual resources:
Produced by: IPUMS USA (uses US Census Bureau Data)
Dates of coverage: 1790 to most current available
Geographic coverage: varies by dataset
About: About IPUMS USA: Includes over sixty integrated, high-precision samples of the American population drawn from sixteen federal censuses, from the American Community Surveys of 2000-present, and from the Puerto Rican Community Surveys of 2005-present. The online resources for analyzing data are complex; be sure to take advantage of the user guides and tutorials.
Produced by: US Department of Health & Human Services Substance Abuse and Mental Health Administration
Geographic coverage: State level, National level
Dates of coverage: varies by data set
Produced by: US National Institutes of Health (NIH)
Geographic coverage:
Dates of coverage: 2020 to present
About: "The NIH Rapid Acceleration of Diagnostics Data Hub (RADx® Data Hub) supports researchers in accessing curated and de-identified COVID-19 data, allowing them to find, aggregate, and perform data analyses in a cloud-enabled platform....The RADx Data Hub supports efforts to understand COVID-19 and factors associated with disparities in COVID-19 morbidity and mortality in underserved and vulnerable populations."
Produced by: US NASA Oak Ridge National Laboratory
Geographic coverage: varies by dataset
Dates of coverage: varies by dataset
Produced by: US Agency for Healthcare Research & Quality
Geographic coverage: State level, National level
Dates of coverage: 1993 to 1-2 years ago
About HCUP Dataset: "Go to the online HCUP Central Distributor to submit applications for Nationwide and State Databases, request complimentary supplemental files that augment information contained in the HCUP databases, submit data re-use and data sharing requests, and download your purchased Nationwide data.
Produced by: US Environmental Protection Agency (EPA)
Dates of coverage: 1987 to most current available
Geographic coverage: Facility, Local, State, and National level
Dioxin/TEQ Data Files (dioxin data and toxic equivalency values): Dioxin mass quantity data from the TRI reporting Form R Schedule 1, along with EPA-calculated Toxic Equivalency values. These files compliment the Basic and Basic Plus Data files. Note that dioxin data are already included in most other TRI tools. Recommended for users familiar with TRI data.
Produced by: US National Toxicology Program (NIH)
Produced by: US General Services Administration (GSA)
Dates of coverage: varies by dataset
Geographic coverage: varies by dataset
Labeled as "The home of the U.S. Government’s open data", this site appears to be a single search engine for data from all federal agencies. It does not have a filter for raw datasets unfortunately.
Produced by: Commonwealth of Pennsylvania
Dates of coverage: varies by data set
Geographic coverage: County and State level
The datasets available through this site are produced by different Pennsylvania state agencies and cover everything from education to opioids to elections to vulnerable Pennsylvanians and much more.
Housed at: University of Michigan Institute for Social Research
About: "ICPSR is an international consortium of more than 750 academic institutions and research organizations. ICPSR (Inter-university Consortium for Political and Social Research) provides leadership and training in data access, curation, and methods of analysis for the social science research community....ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields."
Produced by: Dartmouth (US) Medical School
Geographic coverage: Hospital level
Dates of coverage: Varies, but generally 2015 to more current
About: The Dartmouth Atlas of Health Care uses Medicare and Medicaid data to provide information and analysis about national, regional, and local markets, as well as hospitals and their affiliated physicians.
Produced by: University of Wisconsin School of Medicine and Public Health
Geographic coverage: City, state, region, and nation using Census Bureau block groups
Dates of coverage: 5-year averages based on the American Community Survey
This resource points to various sites that link to open access data.
Produced by: University of Pittsburgh
Dates of coverage: 1800's to most current
The 360 available datasets cover a wide variety of infectious diseases for the US as well as globally.