There is a well-worn path that most people take into machine learning: they find a course, they watch the videos, they run the notebooks, they finish the certificate, and then they sit in front of a real dataset and don’t know where to start.
This is not a failure of effort. It’s a failure of method. Courses optimise for completion. Learning optimises for capability. These are not the same thing.
Start with a problem, not a curriculum
The single most effective accelerant in learning machine learning is having a real question you want to answer. Not a tutorial dataset. Not a practice problem from a course. A question you actually care about — one where you don’t already know the answer.
When I built RHYTHM, my personal ML-powered investment engine, I wasn’t following a curriculum. I was trying to solve a specific problem — could I make systematic investment plans smarter using forecasting models? That question forced me to learn time series modelling properly, understand the Baum-Welch algorithm, and debug TensorFlow training loops because the alternative was the model not working.
Problem-first learning produces deep understanding. Curriculum-first learning produces surface fluency. Find your problem first.
The learning progression that actually works
Step 1: Data wrangling before modelling. Non-negotiable. Before you write your first model, spend weeks becoming very good at pandas and SQL. Load messy data. Clean it. Reshape it. Summarise it.
Step 2: Classical ML before deep learning. Understand linear and logistic regression, decision trees, random forests, and gradient boosting before touching neural networks. These models will solve most real business problems and they are interpretable.
Step 3: Deep learning when your problem requires it. Not because it’s interesting, but because your problem needs it — image classification, sequence modelling, unstructured text.
Step 4: Deploy something. Build a model that does something in the real world — even if it’s just a script that runs daily and outputs a file. The gap between model performance in a notebook and model performance in production is where most practitioners fail.
On courses and resources
Python is the language. pandas and NumPy for data manipulation. scikit-learn for classical ML. PyTorch or TensorFlow for deep learning — PyTorch is easier to debug and increasingly preferred in research and production. SQL everywhere, always.
Kaggle for practice datasets. fast.ai for deep learning (Jeremy Howard’s top-down approach is the best free resource available). StatQuest on YouTube for conceptual clarity.
The mindset that separates the people who make it
Version control (Git). Experiment tracking (MLflow or Weights & Biases). Writing tests for data pipelines. Understanding what your model is actually doing when it fails.
These are not advanced topics. They are foundational professional practices that courses almost never teach because they don’t make good video content.
Learn them early. They will save you more time than any new algorithm.