Skip to Main Content

HSLS MolBio Workshops

Information & resources for hands-on bioinformatics classes.

What is it?

All of Us Research Program

The All of Us Research Program (AoURP), led by the National Institutes of Health, is a longitudinal cohort study aimed at advancing precision medicine and improving human health through partnering with one million or more diverse participants across the United States. With an emphasis on reaching historically underrepresented populations in biomedical research, the datasets from the AoURP include:

  • Electronic health records (OMOP CDM used to standardize all EHR data)
  • Biosamples (blood, urine, and saliva)
  • Mobile health data (from wearable devices that may track biometric data like heart rate and blood pressure)
  • Physical measurements (height, weight, BMI, waist circumference, hip circumference, blood pressure, heart rate, pregnancy status, and wheelchair use)
  • Surveys (there are three core surveys: basic demographics, lifestyle/substance use, and overall health. Additional ones on health care access and utilization, personal and family medical history, and the impact of COVID-19 are available)
  • Genomic data in the form of whole genome sequencing (WGS) and genotyping arrays

 

All of Us Research Hub

The All of Us Research Hub stores health data from a diverse group of participants from across the United States.

  • Data Access Tiers
    • Public Tier: The dataset contains only aggregate data with identifiers removed. These data are available to everyone through Data Snapshots and the Data Browser, an interactive tool on the Research Hub.
    • Registered Tier: The curated dataset contains deidentified individual-level data, available only to approved researchers on the Researcher Workbench. The Registered Tier currently includes data from electronic health records (EHRs), wearables, and surveys, as well as physical measurements taken during participant enrollment.
    • Controlled Tier: The dataset contains genomic data in the form of whole genome sequencing (WGS) and genotyping arrays, previously suppressed demographic data fields from EHRs and surveys, and unshifted dates of events.
  • Data Methods
    • To ensure the Research Hub collects the highest quality data possible, the AoURP employs a comprehensive data methodology to curate data for registered researchers. More info on data collection methods is also available. 

Where to find and analyze the data?

Researcher Workbench

A cloud-based platform where researchers can access AoURP-generated data. Its powerful tools support data analysis and collaboration. Researchers create workspaces to access, store and analyze data for specific research projects. Researchers with R or Python experience can perform high-powered queries and analysis within the AoU datasets using an integrated, cloud-based Jupyter Notebook environment.

Video: Introduction to the Researcher Workbench (2:35 min)

Video: All of Us Researcher Workbench webinar (43:22 min)

How to access data?

The University of Pittsburgh has signed a Data Use and Registration Agreement with the All of Us Research Program, allowing Pitt researchers to apply for Registered and Controlled Tier access. 

Register for data access

  1. Sign up to use the Workbench (make an @researchallofus.org account)
  2.  Complete two-step verification with your @researchallofus.org account 
  3. Verify identity with login.gov. For issues with this step, you can contact support@researchallofus.org (there can be issues even when info is input correctly). Alternative identity verification methods are available if you have a state ID, or phone number or SSN, US passport, or an e-passport. For alternative identity verification methods, contact support@researchallofus.org as well. 
  4. Login to Researcher Workbench
  5. Complete ethics training modules; Pitt users must complete relevant training in the Workbench profiles to access registered and controlled tiers data. To check for access level, you can do the following:
    1. Click the  three-bar menu in the top upper left corner, click your name, and then click “Data Access Requirements."
    2. There users can determine whether they have access to registered tier and controlled tier data. If all boxes are checked green, they have access to that specific type of data (top box for registered tier and bottom box for controlled tier). Users can complete the relevant steps to get access to the different types of data.

 

Video: All of Us Researcher Workbench Onboarding (7:16 min)

How to get training?

AoURP offers New User Orientations that might be useful for introductory users. 

Videos:

Workshops

  • Introduction  to AoU Researcher Workbench: data structure, onboarding, cloud computing, example applications, and scope of research questions (May 9, 2023)
  • Terminology and Data Model Training: introduction to concept sets and their role in observational research with EHR data; ICD, SNOMED, LOINC, genome-related terms, etc.; the data dictionary and relationship to other AoU clinical research tools (June 6, 2023)
  • EHR and Survey Data Analysis: how to create a project using concept sets, datasets, and workbooks; how to use code snippets to analyze EHR or other data types; going beyond code snippets to accomplish more advanced tasks (July 11, 2023)

Links

Training Videos

User Guide 

Data Browser User Guide

How much will it cost for data analysis?

Access to the Researcher Workbench and data are free. Compute, and storage accrue usage costs through the Google Cloud Platform (GCP). All of Us Research Program provides $300 in free credits for each registered Researcher Workbench user, which will help researchers to get started using the Researcher Workbench. You may also apply for more free credits if you run out.

Setting up Billing Account for GCP

Once a user’s credits are low, they should receive a message suggesting they setup a long-term billing solution. The Credits and Billing Page will provide additional information regarding initial credits and how to set up a billing account. 

Billing is controlled at the workspace level, so users would have to link the workspace to an associated billing account, either with GCP or a Google billing partner. If their research is funded by the National Institutes of Health (NIH), they are eligible for the STRIDES GCP pricing initiative

Video: How to add a GCP billing account to your workspace (starts at 20 min)

Example of Data Analysis Cost

An approximate compute cost for data analysis and storage, in dollars, as described by Ramirez et al., is $96.

Cost for data analysis and storage, in dollars

Analysis

Development Cost

Single run cost

Total Cost

Descriptive metrics

28.35

3.48

31.83

Medication Sequencing

28.15

6.39

34.54

Smoking exposure PheWAS

7.00

4.20

11.20

ASCVD Score calculation

11.71

7.20

18.91

Total

75.21

21.27

96.48

The Researcher Workbench uses Google Compute Engine (GCE) for computational resources in the cloud and Google Cloud Storage (GCS) for storage in the cloud. Jupyter Notebooks are loaded in a GCE Virtual machine, which is ephemeral. During this project, the hourly rate of the default n1-highmem-4 machine was $0.27 per hour, including the cost for data proc clusters. Notebooks not in active use are “paused,” accruing a cost of under $0.10 per day, and shut down to zero cost after two weeks of inactivity. The storage “bucket” associated with notebooks within a workspace costs $0.026 per GB per month.

Ramirez, A. H. et al. The All of Us Research Program: Data quality, utility, and diversity. Patterns 3, 100570 (2022)

Research Scope

Learn how researchers used AoURP data via the following searches - start with typing a topic of interest in the search field.

  1. Research Project Directory
  2. All of Us Researcher Convention 2023
  3. Preprints where most of the articles are not published or peer-reviewed yet
  4. Publications
  5. Clinical Studies (PubMed)
  6. NIH Awarded Grants (NIH Reporter)

Learn how researchers used UK Biobank data (similar to AoURP, a large-scale longitudinal cohort study containing in-depth genetic and health information from half a million UK participants) via the following searches - start with typing a topic of interest in the search field. Researchers could try replicating UK Biobank research findings in AoURP datasets.

  1. Publications
  2. Preprints where most of the articles are not published or peer-reviewed yet (type your topic of interest in the search box)
  3. Approved Projects Directory
  4. GeneBass - a resource of exome-based association statistics. The dataset encompasses 4,529 phenotypes with gene-based and single-variant testing across 394,841 individuals with exome sequence data from the UK Biobank. 

Student's Posters:

1. Ivy Baker et al., Mental Health Disparities Between Deaf and Hard of Hearing & Hearing Peers

2. Valerie DeVos et al., Comparing Healthcare Experiences Between Deaf and Hard of Hearing vs Hearing patients During the COVID-19 Pandemic

Suggested Learning Workflow

Steps to follow:

  1. Attend "Introduction to AoURP" workshop, a light-weight (45 min) presentation on AoURP, workbench, and scope of applications
  2. Register for Researcher Workbench and complete the ethics training; https://www.researchallofus.org/register/
  3. Attend workshops on:
    1. "Terminology and Data Models": Introduction to concept sets and their role in observational research with EHR data
    2. "EHR and Survey Data Navigation": How to create a project using concept sets, datasets, and workbooks; how to use code snippets to accomplish more advanced tasks

Support

User Support Hub

All of Us User Support Hub provides all the resources to help navigate researchers through Researcher Workbench.  

Q/A Sessions

  • @ Pitt - Feel free to drop by one of our meetings; here is a schedule: Fridays at 1 pm - April 7,14,21 and May 5,12,19,26

Zoom link: https://pitt.zoom.us/j/6990815019; All sessions are held via Zoom. No registration is required.

Assistance on these topics is provided on a rotating basis on the 1st and 3rd Wednesday and 2nd, 4th, and 5th Friday of each month.

Zoom link: bcm.zoom.us/j/94305076343 | Meeting ID: 943 0507 6343 *All sessions are held via Zoom. No registration is required.

Contacts: Web - bcm.edu/allofuseveningswithgenetics ; email   allofuseveningswithgenetics@bcm.edu

Help Desk

The Help Desk can be contacted anytime using support@researchallofus.org, and will respond within a day of normal business hours (8 am – 5 pm). The Drop-in Office Hours are typically every other Tuesday at 1 pm CST, and data science members are also available for additional research support. Researchers can also register in advance for support.  A Google calendar that lists them all, as well as other sessions, is also available.

Stay Informed

Additional Resources

Funding Opportunities

Funding opportunities for All of Us research from NIH

Key Publications

  • All of Us Research Program Investigators, The "All of Us" Research Program. N Engl J Med. 2019 Aug 15;381(7):668-676. doi: 10.1056/NEJMsr1809937. PMID: 31412182; PMCID: PMC8291101.

  • Ramirez AH, et al.,  All of Us Research Program. The All of Us Research Program: Data quality, utility, and diversity. Patterns (N Y). 2022 Aug 12;3(8):100570. doi: 10.1016/j.patter.2022.100570. PMID: 36033590; PMCID: PMC9403360

HSLS-DBMI Workshops

1. Introduction  to AoU Researcher Workbench:

Data structure, onboarding, cloud computing, example applications, and scope of research questions; 

PowerPoint slides

Lecture Video (May 9th, 2023)

 

2. Terminology and Data Model Training:

Introduction to concept sets and their role in observational research with EHR data; ICD, SNOMED, LOINC, genome-related terms, etc.; the data dictionary and relationship to other AoU clinical research tools

Feedback

3. EHR and Survey Data Analysis:

How to create a project using concept sets, datasets, and workbooks; how to use code snippets to analyze EHR or other data types; going beyond code snippets to accomplish more advanced tasks

 

Feedback