Banner image

Applied Bioinformatics

Date:  October 9–14, 2014
Location:   MDI Biological Laboratory

Overview

The goal of the Applied Bioinformatics Course is to provide hands-on training on major bioinformatics resources through the analysis of an RNA-Seq data set to find differentially expressed genes and investigate previously described functions of those genes and the pathways they are involved in.  

Fold change in activity vs average level of activity
Maine IDEA Network
University of Maine Graduate School of Biomedical Science and Engineering
Dartmouth College Geisel School of MedicineDartmouth College Institute for Quantitative Biomedical Sciences

Tuition

Students/Postdocs: $905.00

Faculty/Industry: $1345.00

Tuition includes course materials, on campus housing and all meals (buffet style)

$ Financial Aid is Available! $

Topics include web-based gene and protein resources, genome browsers, pathways and gene set enrichment analyses, and RNA-Seq data analysis. RNA-Seq data analysis will be conducted using CLC Genomics Workbench, the web-based Galaxy system, R statistical computing environment and Ingenuity Pathways Analysis. The course will feature several modules that will have written worked examples to demonstrate how to apply the major tools or resources featured in the module. Participants should have a strong background in molecular biology. Prior computer programming skills are not required, but participants need to have a strong interest in learning some programming concepts.

 


Benjamin King MS

MDI Biological Laboraoty

Bruce Stanton PhD

Geisel School of Medicine


Casey Greene, Geisel School of Medicine at Dartmouth Tom Hampton, Geisel School of Medicine at Dartmouth Gareth Howell, The Jackson Laboratory W. Kelley Thomas, University of New Hampshire
Chris Dagdigian, BioTeam Steven Munger, The Jackson Laboratory Shawn Prince and Robert Mervis, CLC Genomics Stuart Tugendreich, Ingenuity

Thursday, Oct. 09, Day 1 (Introduction)

5:00 pm - 6:00 pm – Registration and housing check in

6:00 pm – 7:00 pm Dinner (Dining Hall)

7:00 pm - 9:00 pm – Course Introduction and Overview: Introduction to

Applied Bioinformatics (Ben King, Maren Auditorium)

• Boundaries with biology, statistics, computer science

• Contemporary biological examples

• Cell Biology

• Evolution

• Biomedical

• Statistical Challenges and Solutions

• Raw Computational Challenges and Solutions

• Problems of Data Representation and Solutions

 

Friday, Oct. 10, Day 2 (CLC Genomics Workbench)

7:00 am – 9:00 am Continental Breakfast (Dining Hall)

9:00 am - 10:30 am – Introduction to High-Throughput Sequencing

(Kelley Thomas, Maren Auditorium)

• Technologies and Applications

• History

• Chemistry

• Instruments

• Costs

• High-level analysis workflow

10:30 am - 10:45 am – Break

10:45 am - 12:00 pm – Reference Genomes and Alignment Concepts

(CLC Team, Dahlgren Hall)

• Navigating the CLC Genomics Workbench - Screen Elements, Display setup

• Next Generation Sequencing (NGS) data import

- unaligned reads (FASTQ, .sff etc.)

- aligned reads (SAM/BAM)

- Non-NGS data import

• Defining a reference genome

- Curating reference sequences with annotations of interest

- Working with Annotation Tracks

12:00 pm - 1:00 pm – Lunch (Dining Hall)

1:00 pm - 2:30 pm – CLC Genomics

(CLC Team, Dahlgren Hall)

• Trimming and QC

• Read mapping to reference sequence(s)

• Exome or Amplicon sequencing - target enrichment and coverage analysis

• Variant detection, Filtering and Annotation

• Differential gene expression analysis

2:30 pm - 3:00 pm – Break

3:00 pm - 5:30 pm – CLC Genomics

(CLC Team, Dahlgren Hall)

• De novo assembly

• Transcriptome assembly

• ChIP Seq - Peak detection

• Small RNA analysis

• BLAST - Find and compare genes, protein products and place contigs

• Workflow Automation- Visually Creating and Editing Analysis Pipelines

6:00 pm - 7:00 pm – Dinner (Dining Hall)

7:00 pm - 8:00 pm – Keynote Lecture, TBA

(Kelley Thomas, Maren Auditorium)

 

Saturday, Oct. 11, Day 3 (Gene, Protein and Sequence Tools)

7:00 am – 9:00 am Continental Breakfast (Dining Hall)

9:00 am - 10:30 am – Gene, Protein and Sequence Resources

(Gareth Howell + THH/KK/BG, Dahlgren Hall)

• NCBI Entrez system

• UniProt

• Gene Ontology

• miRNA data bases

• RNA-Seq data repositories

- NCBI Gene Expression Omnibus and EBI Array Express

- NCBI Short Read Archive EBI European Nucleotide Archive

10:30 am - 10:45 am – Break

10:45 am - 12:00 pm – Genome Browsers & Data Retrieval

(Gareth Howell + THH/KK/BG, Dahlgren Hall)

• UCSC Genome Browser

• UCSC Table Browser

• Ensembl

• Biomart

12:00 pm - 1:00 pm – Lunch (Dining Hall)

1:00 pm - 2:30 pm – RNA-Seq Experimental Design & Workflow

(Steven Munger, Maren Auditorium)

2:30 pm - 3:00 pm - Break

3:00 pm - 5:00 pm – Analysis of High Throughput Data

(Tom Hampton, Maren Auditorium)

• Exploration

- Pairs

- Histograms

- Heatmaps

- Principle Component Analysis

• Normalization

- Express as a fraction of total

- Means, medians and quantiles

- Ranks

• Inference

- multiple tests

- CART models

- Comparison of multidimensional distances

6:00 pm - 7:00 pm – Dinner (Dining Hall)

 

Sunday, Oct. 12, Day 4 (R)

7:00 am – 9:00 am Continental Breakfast (Dining Hall)

9:00 am - 10:00 am – R Power Tools: Way Beyond Word & Excel

(Tom Hampton, Maren Auditorium)

• Why R

• Packages: CRAN, Bioconductor

• Reproducible, "literate" statistics

• Rstudio

10:00 am - 10:45 am – R Statistical Computing Environment I

(Tom Hampton + KK/BG, Dahlgren Hall)

• Basic Math, Stats and plots

10:45 am - 11:00 am – Break

11:00 am - 12:00 pm – R Statistical Computing Environment II

(Tom Hampton + KK/BG, Dahlgren Hall)

• Variables and Functions

• Simulation

12:00 pm - 1:00 pm – Lunch (Dining Hall)

1:00 pm - 2:45 pm – EdgeR and Differential Expression

(Katja Koeppen + THH/BG, Dahlgren Hall)

• Specify Design

• Normalization

• Estimating Common Dispersion

• Identify Differentially Expressed Genes

2:45 pm - 3:15 pm – Break

3:15 pm - 5:00 pm – Gene Set Enrichment in R

(Britton Goodale + THH/KK, Dahlgren Hall)

• Concepts: Hypergeometric distribution

• Paths: KEGG

• Simulation & Results

6:00 pm - 7:00 pm – Dinner (Dining Hall)

 

Monday, Oct. 13, Day 5 (Ingenuity)

7:00 am – 9:00 am Continental Breakfast (Dining Hall)

9:00 am - 10:30 am – Pathway and Network Concepts

(Stuart Tugendreich/Ingenuity, Maren Auditorium)

10:30 am - 10:45 am – Break

10:45 am - 12:00 pm – Ingenuity I

(Ingenuity + THH/KK/BG, Dahlgren Hall)

• Load CLC data

• Load data from EdgeR analysis

• Load other data

12:00 pm - 1:00 pm – Lunch (Dining Hall)

1:00 pm - 2:00 pm – Ingenuity II

(Ingenuity + THH/KK/BG, Dahlgren Hall)

• Canonical Paths & Networks

2:00 pm - 3:00 pm – Ingenuity III

(Ingenuity + THH/KK/BG, Dahlgren Hall)

• Upstream Regulation and Path Editing

3:00 pm - 3:30 pm – Break

3:30pm - 6:00 pm – Cloud Computing Workshop Using Galaxy

(Chris Dagdigian and Ben King, Maren Auditorium)

• Analyze RNA-Seq dataset using Tuxedo suite

6:00 pm - 7:00 pm – Dinner (Dining Hall)

7:00 pm - 8:00 pm – Large-Scale Computing for Genomics

(Chris Dagdigian, Maren Auditorium)

 

Tuesday, Oct. 14, Day 6 (Machine Learning)

7:00 am – 9:00 am Continental Breakfast (Dining Hall)

9:00 am - 10:00 am – Beyond What is Known - Machine Learning

(Casey Greene, Maren Auditorium)

10:00 am - 10:15 am – Break

10:15 am - 11:15 pm – Hands-on Machine Learning

(Casey Greene + THH/KK/BG, Dahlgren Hall)

• PLGRIM

• IMP

• ScanGeo

11:15 am - 12:00 pm – Course Summary & Evaluations

12:00 pm - 1:00 pm – Lunch & Departure

Tuition includes the cost of on-campus housing for course attendees. Housing units are double occupancy dorm rooms and shared cottages. Family lodging is not available.

Travel information

© 2009-2014 MDI Biological Laboratory. All rights reserved.