These are the platforms, tools, datasets, and communities I would point someone to if they were building a data science foundation today. Not an exhaustive list — a curated one. Everything here has earned its place through genuine usefulness.
Where to Learn
fast.ai — The best free deep learning curriculum available. Jeremy Howard’s top-down teaching approach is counterintuitive and highly effective. Start here for deep learning.
Kaggle Learn — Short, practical micro-courses on pandas, SQL, feature engineering, and machine learning. Better than most paid courses for building hands-on fluency quickly.
StatQuest with Josh Starmer — YouTube channel that explains statistical and ML concepts with unusual clarity. If you’re confused about how an algorithm works, check here first.
CS229 — Stanford Machine Learning — Andrew Ng’s machine learning course, available free. The lecture notes are particularly valuable as a reference.
Missing Semester — MIT — Covers the tools every developer and data scientist needs but that are rarely taught formally: shell, version control, debugging.
Tools to Learn, in Order
Python — The language. No debate.
pandas + NumPy — Data manipulation. Learn these properly before anything else.
SQL — Everywhere, always. Window functions, CTEs, complex joins.
scikit-learn — Classical ML. The standard library.
matplotlib + seaborn — Visualisation. Enough to communicate findings clearly.
Git — Version control. Non-negotiable for professional work.
PyTorch or TensorFlow — Deep learning. PyTorch is increasingly preferred.
Docker — Containerisation. Required for anything that runs in production.
Practice Datasets
Kaggle Datasets — Enormous variety. Good for finding domain-specific data.
UCI ML Repository — Classic datasets used across thousands of papers. Good reference.
data.gov / data.gov.in — Government open data. Messy, real, and very good practice.
Communities Worth Following
Towards Data Science — Mixed quality but high ceiling. The best articles are from practitioners solving real problems.
r/MachineLearning — Research announcements and technical discussion. Signal-to-noise is reasonable.
Hacker News — Best for staying current on tools, papers, and industry direction.