The All of Us Research Program (AoURP), led by the National Institutes of Health, is a longitudinal cohort study aimed at advancing precision medicine and improving human health through partnering with one million or more diverse participants across the United States. With an emphasis on reaching historically underrepresented populations in biomedical research, the datasets from the AoURP include:
The All of Us Research Hub stores health data from a diverse group of participants from across the United States.
A cloud-based platform where researchers can access AoURP-generated data. Its powerful tools support data analysis and collaboration. Researchers create workspaces to access, store and analyze data for specific research projects. Researchers with R or Python experience can perform high-powered queries and analysis within the AoU datasets using an integrated, cloud-based Jupyter Notebook environment.
The University of Pittsburgh has signed a Data Use and Registration Agreement with the All of Us Research Program, allowing Pitt researchers to apply for Registered and Controlled Tier access.
AoURP offers New User Orientations that might be useful for introductory users.
Access to the Researcher Workbench and data are free. Compute, and storage accrue usage costs through the Google Cloud Platform (GCP). All of Us Research Program provides $300 in free credits for each registered Researcher Workbench user, which will help researchers to get started using the Researcher Workbench. You may also apply for more free credits if you run out.
Once a user’s credits are low, they should receive a message suggesting they setup a long-term billing solution. The Credits and Billing Page will provide additional information regarding initial credits and how to set up a billing account.
Billing is controlled at the workspace level, so users would have to link the workspace to an associated billing account, either with GCP or a Google billing partner. If their research is funded by the National Institutes of Health (NIH), they are eligible for the STRIDES GCP pricing initiative.
An approximate compute cost for data analysis and storage, in dollars, as described by Ramirez et al., is $96.
Analysis |
Development Cost |
Single run cost |
Total Cost |
---|---|---|---|
Descriptive metrics |
28.35 |
3.48 |
31.83 |
Medication Sequencing |
28.15 |
6.39 |
34.54 |
Smoking exposure PheWAS |
7.00 |
4.20 |
11.20 |
ASCVD Score calculation |
11.71 |
7.20 |
18.91 |
Total |
75.21 |
21.27 |
96.48 |
The Researcher Workbench uses Google Compute Engine (GCE) for computational resources in the cloud and Google Cloud Storage (GCS) for storage in the cloud. Jupyter Notebooks are loaded in a GCE Virtual machine, which is ephemeral. During this project, the hourly rate of the default n1-highmem-4 machine was $0.27 per hour, including the cost for data proc clusters. Notebooks not in active use are “paused,” accruing a cost of under $0.10 per day, and shut down to zero cost after two weeks of inactivity. The storage “bucket” associated with notebooks within a workspace costs $0.026 per GB per month.
Ramirez, A. H. et al. The All of Us Research Program: Data quality, utility, and diversity. Patterns 3, 100570 (2022)
Learn how researchers used AoURP data via the following searches - start with typing a topic of interest in the search field.
Learn how researchers used UK Biobank data (similar to AoURP, a large-scale longitudinal cohort study containing in-depth genetic and health information from half a million UK participants) via the following searches - start with typing a topic of interest in the search field. Researchers could try replicating UK Biobank research findings in AoURP datasets.
Student's Posters:
1. Ivy Baker et al., Mental Health Disparities Between Deaf and Hard of Hearing & Hearing Peers
2. Valerie DeVos et al., Comparing Healthcare Experiences Between Deaf and Hard of Hearing vs Hearing patients During the COVID-19 Pandemic
All of Us User Support Hub provides all the resources to help navigate researchers through Researcher Workbench.
Zoom link: https://pitt.zoom.us/j/6990815019; All sessions are held via Zoom. No registration is required.
Assistance on these topics is provided on a rotating basis on the 1st and 3rd Wednesday and 2nd, 4th, and 5th Friday of each month.
Zoom link: bcm.zoom.us/j/94305076343 | Meeting ID: 943 0507 6343 *All sessions are held via Zoom. No registration is required.
Contacts: Web - bcm.edu/allofuseveningswithgenetics ; email allofuseveningswithgenetics@bcm.edu;
The Help Desk can be contacted anytime using support@researchallofus.org, and will respond within a day of normal business hours (8 am – 5 pm). The Drop-in Office Hours are typically every other Tuesday at 1 pm CST, and data science members are also available for additional research support. Researchers can also register in advance for support. A Google calendar that lists them all, as well as other sessions, is also available.
Funding opportunities for All of Us research from NIH
All of Us Research Program Investigators, The "All of Us" Research Program. N Engl J Med. 2019 Aug 15;381(7):668-676. doi: 10.1056/NEJMsr1809937. PMID: 31412182; PMCID: PMC8291101.
Ramirez AH, et al., All of Us Research Program. The All of Us Research Program: Data quality, utility, and diversity. Patterns (N Y). 2022 Aug 12;3(8):100570. doi: 10.1016/j.patter.2022.100570. PMID: 36033590; PMCID: PMC9403360
Data structure, onboarding, cloud computing, example applications, and scope of research questions;
Lecture Video (May 9th, 2023)
Introduction to concept sets and their role in observational research with EHR data; ICD, SNOMED, LOINC, genome-related terms, etc.; the data dictionary and relationship to other AoU clinical research tools
How to create a project using concept sets, datasets, and workbooks; how to use code snippets to analyze EHR or other data types; going beyond code snippets to accomplish more advanced tasks