So you want to be a data scientist… If this is your first time here and you’ve come to learn a new skill sit down and buckle up, this should be a long journey. Don’t go it alone watch these great talks on learning.
General Coding
These are mostly python (or language-agnostic) resources. Resources specific to non-python languages are listed below.
Quick References
Resources
The Zen of Python and The Rules of Extreme Programming
PEP 8 - Style Guide for Python Code
RealPython Tutorials
Calmcode.io - tools for python
Coding Interview University - A long course on programming basics (data structures, sorting algorithms, graphs, recursion & dynamic programming, sets, etc…)
Learn Python Courses
PyNative
RealPython’s Object Oriented Programming in Python 3 + video
Python Programming Exercises, Gently Explained
The Big Book of Small Python Projects
When you’re done try to code the problems in this book yourself.
Beyond the Basic Stuff with Python
Automate the Boring Stuff with Python
Coding Practice
The almighty LeetCode
Coding Bat (Java + Python)
Project Euler
YouTube Channels & Series
CS Dojo - shorts for general programming tips
Non-python resources
Git Documentation
Java Hypertext from Cornell’s CS 2110: OOP and Data Structures with David Gries
MIT’s 18.S191/6.S083/22.S092: Introduction to Computational Thinking (Julia)
Cornell’s ORIE 6125: Computational Methods in Opeations Research (Shell+Julia) with Vasileios Charisopoulos
AI and Machine Learning (AI/ML)
Tools
Data Version Control (DVC)
Keras Core (Soon to be wrapped into Keras3.0)
Debugging PyTorch
Google Colab
Paperspace Gradient
Online References
ML Interview Bible by Chip Huyen
AI Summer
AI/ML Courses
If its been a while since your last math course best brush up on the math basics before going any further.
Andrew Ng’s Machine Learning Series + Coursera
Cornell’s CS 4/5780: Intro to Machine Learning with Anil Damle and Kilian Weinberger + Online lecture videos
Software Tutorials
Github repo for Cornell’s ChemE 6880: Industrial Big Data Analytics & Machine Learning with Fengqi You
Deep Learning Courses
FastAI’s Practical Deep Learning for Coders + textbook and resources
MIT’s Introduction to Deep Learning
Yann LeCun’s Deep Learning Course at NYU CDS (PyTorch)
Cornell’s CS 4787/5777: Principles of Large Scale Machine Learning with Christopher De Sa
Stanford’s CS 230: Deep Learning
Software Tutorials
Google’s ML Crash Course (TensorFlow) + TensorFlow Tutorials
PyTorch Beginner Course + YouTube Series
GitHub for Cornell’s ChemE 6888: Deep Learning with Fengqi You
Courses in Deep Learning Applications
Stanford’s CS 231n: Deep Learning for Computer Vision
Stanford’s CS 236: Deep Generative Models with Stefano Ermon
CMU’s CS 11-747: Neural Networks for NLP
Reinforcement Learning Courses
Cornell CS 4/5789: Introduction to Reinforcement Learning + Online lecture videos
Spinning Up in Deep RL from OpenAI + GitHub Repo
AI/ML Practice
Blog Posts
Oren Etzioni’s “How to get up to speed on Machine Learning and AI”
Faizan Shaikh’s “Simple Beginner’s Guide to Reinforcement Learning & Its Implementation”
Harsh Sikka’s “The Blunt Guide to Mathematically Rigorous Machine Learning”
Harsh Sikka’s “The Math Required for Machine Learning”
YouTube Channels & Series
3Blue1Brown - shorts for maths
Especially the series on Neural Networks
CS 4/5780 online lecture videos by Kilian Weinberger
Textbooks
Alice’s Adventures in a Differentiable Wonderland by Simone Scardapane
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
Machine Learning: A Probabilistic Perspective by Kevin Patrick Murphy
The Elements of Statistical Learning by Trevor Hastie and Robert Tibshirani and Jerome Friedman
Grokking Deep Learning by Andrew W. Trask
Cheminformatics
Quick Resources
Tools
RDKit
RDChiral - Wrapper for RDKit’s RunReactants to improve stereochemistry handling (Paper)
Open Babel - Chemical Format Conversion
PyTorch Geometric - easy GNNs for PyTorch
Generative Toolkit for Scientific Discovery (GT4SD) - IBM Zurich porject, lots of built in training workflows
MolFlux - Molecular modeling toolkit from Exs
PhysicsML - Toolkit for physics-based modelling from Exs
Chainer is out of support as of Dec-2019Chainer Chemistry GitHub - A deep learning framework for Biology and Chemistry + Docs
Blogs
Practical Cheminformatics by Pat Walters
Cheminfomania by Esben Jannik Bjerrum
Online References
Scientific Computing for Chemists with Python by Charles J. Weiss
Deep Learning for Moleucles & Materials by Andrew White
Math Basics
Quick References
No Bullshit Guide to Linear Algebra in 4 Pages
Linear Algebra Review from Stanford’s CS 229
Probability and Statistics Review from Stanford’s CS 229 + short version
Calc III Study Guide
Online Math Courses
Gilbert Strang’s Linear Algebra - OCW
Denis Auroux’s Multivariate Calculus - OCW
Tom Leighton’s Discrete Mathematics for Computer Science - OCW