Skip to Main Content

Publicly Available Sources of Data for Health & Social Determinants of Health

What you will need to use these datasets

Many of the public use data sets often include documentation, coding information (labels, for example), etc. Don't ignore those files!

Additionally, you may need to complete special permission forms to access some data sets and/or pay a fee. 

It is your responsibility to determine the requirements for the use of any of the data sets listed on this page. 

Reporting the use of and citing datasets in your manuscript

Health Research Reporting Guidelines: EQUATOR Network

Be sure to cite all datasets used in any manuscript you create. Additionally, consider using these reporting guidelines to ensure your manuscript is complete and transparent.

Observational study reporting guidelines

Federal Datasets

Centers for Disease Control & Prevention Datasets

Produced by: US Centers for Disease Control & Prevention (CDC)
Dates of coverage: varies by dataset
Geographic coverage: varies by dataset

CDC Dataset Catalog: Data by Topic

This site provides links to almost 700 datasets from the CDC as well as from other Federal agencies. Many, but not all, of these datasets are accessible to the public. 

National Center for Health Statistics Public Use Data Files and Documentation

CDC datasets by program


Social Vulnerability Index Data & Documentation

Produced by: US Agency for Toxic Substances and Disease Registry (ATSDR) and Centers for Disease Control & Prevention (CDC)
Dates of coverage: varies by data set
Geographic coverage: Census Tracts and County level
Years: 2000, 2010, 2014, 2016, 2018

Choose by Year, Geography, and Geography Type to download data.


Medicare & Medicaid Data

Produced by: U.S. Centers for Medicare & Medicaid Services
Dates of coverage: varies
Geographic coverage: County, State, and National level
Individual files:


Catalog of Surveillance Systems (Childhood obesity research)

Produced by: National Collaborative on Childhood Obesity Research
Dates of coverage: varies
Geographic coverage: County, State, and National level

"The Catalogue of Surveillance Systems provides one-stop access to over 100 publicly available datasets relevant to childhood obesity research.

Datasets profiled in the Catalogue include information on obesity-related:

  • Health behaviors, outcomes, and determinants
  • Policies and environmental factors

Surveillance systems included in the Catalogue were identified by reviewing existing reports and soliciting expert input. The systems were chosen because they provide access to publicly available raw data gathered in the United States and were released in the past 10 years."

Citing the Catalog

  • Catalogue of Surveillance Systems. National Collaborative on Childhood Obesity Research. https://www.nccor.org/nccor-tools/catalogue/ [Accessed on: Month Day, Year].
  • McKinnon, RA, Reedy, J, Berrigan, D, et al. The National Collaborative on Childhood Obesity Research Catalogue of Surveillance Systems and Measures Registry: New tools to spur innovation and increase productivity in childhood obesity research. Am J Prev Med. 2012 Apr;42(4):433-5. Available at https://10.1016/j.amepre.2012.01.004

US Census Bureau Datasets 

Produced by: US Census Bureau
Dates of coverage: varies
Geographic coverage: varies by dataset
Individual resources:


IPUMS USA 

Produced by: IPUMS USA (uses US Census Bureau Data)
Dates of coverage: 1790 to most current available
Geographic coverage: varies by dataset
About: About IPUMS USA: Includes over sixty integrated, high-precision samples of the American population drawn from sixteen federal censuses, from the American Community Surveys of 2000-present, and from the Puerto Rican Community Surveys of 2005-present. The online resources for analyzing data are complex; be sure to take advantage of the user guides and tutorials. 


Substance Abuse and Mental Health Data Sets

Produced by: US Department of Health & Human Services Substance Abuse and Mental Health Administration
Geographic coverage: State level, National level
Dates of coverage: varies by data set


COVID Rapid Acceleration of Diagnostics (RADx) Data Hub

Produced by: US National Institutes of Health (NIH)
Geographic coverage:  
Dates of coverage: 2020 to present
About:  "The NIH Rapid Acceleration of Diagnostics Data Hub (RADx® Data Hub) supports researchers in accessing curated and de-identified COVID-19 data, allowing them to find, aggregate, and perform data analyses in a cloud-enabled platform....The RADx Data Hub supports efforts to understand COVID-19 and factors associated with disparities in COVID-19 morbidity and mortality in underserved and vulnerable populations."


Distributed Active Archive Center

Produced by: US NASA Oak Ridge National Laboratory
Geographic coverage:  varies by dataset
Dates of coverage: varies by dataset

Get Data


Healthcare Cost and Utilization Project Data (HCUP; Purchase data)

Produced by: US Agency for Healthcare Research & Quality
Geographic coverage: State level, National level
Dates of coverage: 1993 to 1-2 years ago
About HCUP Dataset: "Go to the online HCUP Central Distributor to submit applications for Nationwide and State Databases, request complimentary supplemental files that augment information contained in the HCUP databases, submit data re-use and data sharing requests, and download your purchased Nationwide data.


Environmental Protection Agency (EPA)

Produced by: US Environmental Protection Agency (EPA)
Dates of coverage: 1987 to most current available
Geographic coverage: Facility, Local, State, and National level

  • TRI data and tools for advanced/customized analysis
    • Basic Data Files
      • Data for a reporting year for the entire U.S. Each .zip file is made up of 10 .txt files that collectively contain all data elements from the TRI reporting form (except Form R Schedule 1, which is available separately)
      • Recommended for users familiar with TRI data.
    • Data Plus Data Files
      • Data for a reporting year for the entire U.S. Each .zip file is made up of 10 .txt files that collectively contain all data elements from the TRI reporting form (except Form R Schedule 1, which is available separately)
      • Recommended for users familiar with TRI data.
    • TRI-CHIP (toxicity of TRI chemicals)
      • Access and analyze toxicity information for TRI-covered chemicals through this downloadable Microsoft Access database
    • Dioxin/TEQ Data Files (dioxin data and toxic equivalency values): Dioxin mass quantity data from the TRI reporting Form R Schedule 1, along with EPA-calculated Toxic Equivalency values. These files compliment the Basic and Basic Plus Data files. Note that dioxin data are already included in most other TRI tools. Recommended for users familiar with TRI data.


Download National Toxicology Program Data

Produced by: US National Toxicology Program (NIH)


Data.gov

Produced by: US General Services Administration (GSA)
Dates of coverage: varies by dataset
Geographic coverage:  varies by dataset

Labeled as "The home of the U.S. Government’s open data", this site appears to be a single search engine for data from all federal agencies. It does not have a filter for raw datasets unfortunately.

State Government Datasets

OpenData PA: Pennsylvania Data Assets 

Produced by: Commonwealth of Pennsylvania
Dates of coverage: varies by data set
Geographic coverage: County and State level

The datasets available through this site are produced by different Pennsylvania state agencies and cover everything from education to opioids to elections to vulnerable Pennsylvanians and much more.

Datasets from other organizations

ICPSR (Inter-university Consortium for Political and Social Research)

Housed at: University of Michigan Institute for Social Research
About: "ICPSR is an international consortium of more than 750 academic institutions and research organizations. ICPSR (Inter-university Consortium for Political and Social Research) provides leadership and training in data access, curation, and methods of analysis for the social science research community....ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields."


The Dartmouth Atlas of Health Care

Produced by: Dartmouth (US) Medical School
Geographic coverage: Hospital level
Dates of coverage: Varies, but generally 2015 to more current
About: The Dartmouth Atlas of Health Care uses Medicare and Medicaid data to provide information and analysis about national, regional, and local markets, as well as hospitals and their affiliated physicians.


Neighborhood Atlas

Produced by: University of Wisconsin School of Medicine and Public Health
Geographic coverage: City, state, region, and nation using Census Bureau block groups
Dates of coverage: 5-year averages based on the American Community Survey


HSLS Data Services LibGuide

This resource points to various sites that link to open access data.


Project Tycho Pre-compiled US data sets

Produced by: University of Pittsburgh
Dates of coverage: 1800's to most current

The 360 available datasets cover a wide variety of infectious diseases for the US as well as globally.