Skip to Main Content

HSLS MolBio Workshops

Information & resources for hands-on bioinformatics classes.

Data Analysis: what's involved?

Data analysis is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making.

This process results in four sequential categories of collected data:

  1. raw data = original data collected from a source, without any manipulation
  2. cleaned data = modified data, after removal of incorrect, incomplete, irrelevant, duplicated, or improperly formatted data
  3. processed data = data that has been validated, sorted, classified, calculated, or otherwise transformed
  4. analyzed data = data that has been interpreted for meaning

Example of bioinformatics data analysis techniques:

  • high throughput (microarrays, RNA/DNA-Seq)
  • DNA/protein sequence manipulation
  • SNP, genetic variation, Genome Wide Association
  • functional analysis
  • signaling, network, and pathway analysis
  • transcription factor and gene regulatory sequence analysis

The Four C's

The "four C's" cover the broad categories of options for analyzing bioinformatics data, including paying someone else to do it (Core labs), working with another researcher (Collaboration), and doing it yourself either by learning to program (Coding) or using out-of-the-box software (Commercially licensed tools).

Core Labs

Core facilities are centralized technology-based laboratories that maintain and support sophisticated equipment, training, and computational services for a fee. These links are to Pitt/UPMC cores of particular relevance for bioinformatics data analysis.


Links to Pitt departments/centers to find fellow researchers to contact for potential collaborations on bioinformatics projects.


Links to resources to help you learn how to code.

Commercially Licensed Tools

Links to resources licensed by Pitt that help with bioinformatics data analysis, without the need for programming experience.

Other Considerations

Computational Needs: Sharing, Storage, & Performance

Rigor & Reproducibility