The goal of this course is to provide students with an overview of the fields of bioinformatics and computational genomics, and provide space to acquire basic computer literacy needed to perform analyses of next generation sequencing experiments. We will attend fundamental concepts in the analysis and interpretation of genomics data from bulk to single-cell technologies, and explore in depth the structure properties of key data types and their associated file formats. While we acknowledge that 6 weeks is far to short a time for this course to be considered comprehensive, it is our goal to provide students with basic skills to empower themselves to advocate for and analyze their own data.
The course takes place over 6 weeks with 1.5 hour lectures on Tuesdays and Thursdays 1:00 PM - 2:30 PM for a total of 12 lectures and workshops. Most lectures are accompanied by either readings or some light homework or both. Any assignments must be completed on time for full credit, no exceptions. There is no final exam.
Due to social distancing protocols the course will take place online on MS Teams. Links to be provided by email invite.
The course is worth 120 points. Late assignments will be given a maximum of 50% of full credit up to one lecture after the original due date. When homework is not assigned, points will be given for attendance. Readings are indicated on the syllabus. Each question set is worth 10 points and will be graded based on completeness. Links to lecture slides and readings will be provided in the syllabi below.
Attendance is required no exceptions. You may obtain permission ahead of time or with extenuating circumstances after the fact from the graduate school (email to Emma Yates Kassler). Homework assignments are required on time (see schedule below) regardless of attendance or for half credit one lecture late. Unexcused absences result in an incomplete grade.
Lecture topics:
- Introduction
- Course overview
- History of bioinformatics
- Introduction to Unix
Lecture slides:
readings:
Lecture topics:
- The Human Genome
- More basic unix and tour the unix file system
Lecture slides:
Homework Assignment 1:
Lecture topics:
- Understanding NGS file formats
- Understanding NGS quality assessment
Lecture slides:
Homework Assignment 2:
[Due March]
Working with data on the command line: Searching NGS File Formats (Coetzee)
Lecture topics:
- Search for characters or patterns in a text file using the grep command
- Write to and append a file using output redirection
- Use the pipe
|
character to chain together commands
Lecture slides:
Tuesday, March 29 Working with data on the command line: awk & bedtools - MORE* Searching NGS File Formats*
Lecture topics:
- Search for characters or patterns in a column specific manner using awk.
- Learn to use bedtools to accomplish genome arithmetic.
Lecture slides:
Homework Assignment 3:
Lecture topics:
- Experimental Design
- Management of Big Data Projects
- Biological Enrichment
- Use (and Misuse) of Ontologies and Their Significance
Lecture slides:
readings:
- All Biology is Computational Biology
- Ten Simple Rules for Large Scale Data Processing
- Urgent need for consistent standards in functional enrichment analysis
Homework Assignment 4:
Lecture topics:
- Understanding computational context of RNA-Seq.
- Learning to judge data for quality metrics of RNA-Seq.
- Quick differential expression analysis.
Readings and websites:
- Orchestrating Single-Cell Analysis with Bioconductor (OSCA)
- Seurat
- Scanpy - single cell analysis with python
- scTransform
- Pearson Residuals for Normalization
- GPU - probabilistic models for single-cell omics
Lecture topics:
TBD
Lecture topics:
16S rRNA sequence data processing and analysis
Homework Assignment : Follow and complete the tutorial for dada2 prior to April 14. Please also install the following R packages: vegan, phyloseq, lmerTest, lme4, ggplot2, dplyr, ape, reshape2
- What is project management?
- Git Basics; creating and cloning a repo
- Adding, committing, and pull requests
Homework Assignment :
Due April 26: Create a private git repo and populate with all prior homework assignments. See lecture slides.
Thursday, April 21 (Lecture 12) Automated Machine Learning in Biomedicine: AutoMLPipe-BC (Urbanowicz)
- What is machine learning?
- Biomedical data challenges
- Elements of machine learning analysis pipeline
- Automated Machine Learning
- Demonstration of AutoMLPipe-BC
Pre-Lecture: Follow instructions on slides 109-111 to install AutoMLPipe-BC on your Google-Drive.
day | date | lecturer | hmwk | due |
---|---|---|---|---|
Tue | 03/15 | HAZELETT | ||
Thu | 03/17 | HAZELETT | ||
Tue | 03/22 | COETZEE | hw1 | 03/29 |
Thu | 03/24 | COETZEE | hw2 | 03/31 |
Tue | 03/29 | COETZEE | hw3 | 04/05 |
Thu | 03/31 | HAZELETT | hw4 | 04/14 |
Tue | 04/05 | COETZEE | ||
Thu | 04/07 | COETZEE | ||
Tue | 04/12 | LAWRENSON | 04/21 | |
Thu | 04/14 | VUJKOVIC-CVIJIN | dada2 tutorial | 04/14 |
Tue | 04/19 | HAZELETT | see lecture slides | 04/26 |
Thu | 04/21 | URBANOWICZ |