Statistical Inference in Biology
As our data generation capabilities are fast outpacing our ability to analyze and make sense of the data, statistical inference and machine learning methods are becoming invaluable across the realms of science. This is a course on the fundamentals of stastical inference/machine learning/data science. Emphasis will be on learning the methods from first principles as opposed to using them as black boxes. We will use real biological examples as much as possible. Note that this is not really a course on statistics (e.g. hypothesis testing is not covered).
Syllabus: Basics of probability: axioms, conditional probability, random variables, expectations, standard probability distributions, methods for sampling from a distribution, Introduction to Markov Chains, Monte Carlo sampling, Least squares and Linear regression, Constrained Optimization, Bias-variance tradeoff and variable selection (ridge regression and lasso), Linear Mixed Effects, Maximum Likelihood and Expectation Maximization, Bayesian inference, MCMC and Gibbs sampling, Bayesian model selection, Dimensionality reduction and clustering, supervised learning: SVMs, neural networks, deep networks, genetic algorithms.
Course structure: Lectures + weekly programming/maths assignments. A term project involving implementing one of the methods on a biological example is to be submitted in the last month of the course.
Prerequisites: Basic programming skills in Python or R, mathematics upto class 12.
Evaluation: 6 homework assignments + 1 end-term project.
Course outcome: Familiarity with basic concepts in statistical inference, able to choose and implement the above sorts of methods for their own research.