Skip to content

shreejitverma/Data-Scientist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Scientist

An open source Data Science repository to learn and apply towards solving real world problems.

This is a shortcut path to start studying Data Science. Just follow the steps to answer the questions, "What is Data Science and what should I study to learn Data Science?"

Everything that you will need on Data Science

What is Data Science?

Data Science is one of the hottest topics on the Computer and Internet farmland nowadays. People have gathered data from applications and systems until today and now is the time to analyze them. The next steps are producing suggestions from the data and creating predictions about the future. Here you can find the biggest question for Data Science and hundreds of answers from experts.

Link Preview
What is Data Science @ O'reilly Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”
What is Data Science @ Quora Data Science is a combination of a number of aspects of Data such as Technology, Algorithm development, and data interference to study the data, analyse it, and find innovative solutions to difficult problems. Basically Data Science is all about Analysing data and driving for business growth by finding creative ways.
The sexiest job of 21st century Data scientists today are akin to Wall Street “quants” of the 1980s and 1990s. In those days people with backgrounds in physics and math streamed to investment banks and hedge funds, where they could devise entirely new algorithms and data strategies. Then a variety of universities developed master’s programs in financial engineering, which churned out a second generation of talent that was more accessible to mainstream firms. The pattern was repeated later in the 1990s with search engineers, whose rarefied skills soon came to be taught in computer science programs.
Wikipedia Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data.
How to Become a Data Scientist Data scientists are big data wranglers, gathering and analyzing large sets of structured and unstructured data. A data scientist’s role combines computer science, statistics, and mathematics. They analyze, process, and model data then interpret the results to create actionable plans for companies and other organizations.
a very short history of #datascience The story of how data scientists became sexy is mostly the story of the coupling of the mature discipline of statistics with a very young one--computer science. The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by scientists, statisticians, librarians, computer scientists and others for years. The following timeline traces the evolution of the term “Data Science” and its use, attempts to define it, and related terms.

Learn Data Science

Our favorite programming language is Python nowadays for #DataScience. Python's - Pandas library has full functionalities for collecting and analyzing data. We use Anaconda to play with data and to create applications.

Algorithms

These are some Machine Learning and Data Mining algorithms and models help you to understand your data and derive meaning from it.

Supervised Learning

  • Regression
  • Linear Regression
  • Ordinary Least Squares
  • Logistic Regression
  • Stepwise Regression
  • Multivariate Adaptive Regression Splines
  • Locally Estimated Scatterplot Smoothing
  • Classification
    • k-nearest neighbor
    • Support Vector Machines
    • Decision Trees
    • ID3 algorithm
    • C4.5 algorithm
  • Ensemble Learning
  • Boosting
  • Bagging
  • Random Forest
  • AdaBoost

Unsupervised Learning

  • Clustering
    • Hierchical clustering
    • k-means
    • Fuzzy clustering
    • Mixture models
  • Dimension Reduction
    • Principal Component Analysis (PCA)
    • t-SNE
  • Neural Networks
  • Self-organizing map
  • Adaptive resonance theory
  • Hidden Markov Models (HMM)

Semi-Supervised Learning

  • S3VM
  • Clustering
  • Generative models
  • Low-density separation
  • Laplacian regularization
  • Heuristic approaches

Reinforcement Learning

  • Q Learning
  • SARSA (State-Action-Reward-State-Action) algorithm
  • Temporal difference learning

Data Mining Algorithms

  • C4.5
  • k-Means
  • SVM
  • Apriori
  • EM
  • PageRank
  • AdaBoost
  • kNN
  • Naive Bayes
  • CART

Deep Learning architectures

  • Multilayer Perceptron
  • Convolutional Neural Network (CNN)
  • Recurrent Neural Network (RNN)
  • Boltzmann Machines
  • Autoencoder
  • Generative Adversarial Network (GAN)
  • Self-Organized Maps

Contents

Career Tracks

  1. Data Scientist with Python
  2. Data Analyst with Python
  3. Data Analyst with SQL Server
  4. Data Science for Everyone
  5. Machine Learning Scientist with Python

tdsp

Data Science Collected Resources

A trove of carefully curated resources and links (on the topics of software, platforms, language, techniques, etc.) related to data science, all in one place.

Please feel free to connect with me here on LinkedIn if you are interested in data science and would like to connect


Artificial Intelligence related

MONTRÉAL.AI ACADEMY: ARTIFICIAL INTELLIGENCE 101 FIRST WORLD-CLASS OVERVIEW OF AI FOR ALL

OpenAI blog

AI thinks like a corporation—and that’s worrying - Open Voices

AITopics

Does the Brain Store Information in Discrete or Analog Form?

Explainable Artificial Intelligence (Part 1) — The Importance of Human Interpretable Machine…

Is The Singularity Coming? – Arc Digital

Michael I. Jordan NYSE Machine Learning Presentation

Some scientists fear superintelligent machines could pose a threat to humanity | The Washington Post

The Four Waves of A.I. | LinkedIn

When algorithms go wrong we need power to fight back, say researchers - The Verge

AWS related

Amazon CloudWatch - Application and Infrastructure Monitoring

Amazon DynamoDB - Overview

Amazon Elastic Block Store (EBS) - Amazon Web Services

Amazon Elastic File System (EFS) | Cloud File Storage

AWS Concepts: Understanding AWS - YouTube

AWS Concepts: Understanding the Course Material & Features - YouTube

AWS In 10 Minutes | AWS Tutorial For Beginners | AWS Training Video | AWS Tutorial | Simplilearn - YouTube

AWS re:Invent 2017: Building production apps easily with Amazon Lightsail (CMP212) - YouTube

Classless Inter-Domain Routing - Wikipedia

Cloud Compute Products – Amazon Web Services (AWS)

Cloud Object Storage | Store & Retrieve Data Anywhere | Amazon Simple Storage Service

Elastic Load Balancing - Amazon Web Services

Getting Spark, Python, and Jupyter Notebook running on Amazon EC2

Use PuTTY to access EC2 Linux Instances via SSH from Windows

What is Cloud Computing? - Amazon Web Services

Blogs, StacksExchanges

7-Step Guide to Become a Machine Learning Engineer in 2021

Reducing the Need for Labeled Data in Generative Adversarial Networks

Jason's Google ML 101 deck

10 Free Must-Read Books for Machine Learning and Data Science

Advice to aspiring data scientists: start a blog – Variance Explained

Brandon Roher Blog

Chris Albon - Data Science, Machine Learning, and Artificial Intelligence

Data Science Stack Exchange

Data Skeptic

DataTau

explained.ai - Deep explanations of machine learning and related topics

FlowingData

Here Are (Approximately) 3000 Free Data Sources You Can Use Right Now

If you want to learn Data Science, take a few of these statistics classes

Learn Data Science - Infographic (article) - DataCamp

LIGO Gravity Wave GW150914_tutorial

O.R. & Analytics Success Stories - INFORMS

OpenAI Blog

Paul Ford: What Is Code? | Bloomberg

Science Isn’t Broken | FiveThirtyEight

Scientifically Sound

AIspace

Top 28 Cheat Sheets for Machine Learning, Data Science, Probability, SQL & Big Data

GitHub Python Data Science Spotlight: AutoML, NLP, Visualization, ML Workflows

Books, Courses, Repos

Solved end-to-end Data Science projects

Dive into Deep Learning (An interactive deep learning book with code, math, and discussions)

Machine Learning Math book

Learn to code | Codecademy

Lecture Notes | Introduction to MATLAB | Electrical Engineering and Computer Science | MIT OpenCourseWare

60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more

Feature Engineering and Selection: A Practical Approach for Predictive Models

Nerual Networks and Deep Learning - an online book

Git and Github

Adding an existing project to GitHub using the command line - User Documentation

An Intro to Git and GitHub for Beginners (Tutorial)

Follow these simple rules and you’ll become a Git and GitHub master

Git - Book

git - the simple guide - no deep shit!

How not to be afraid of GIT anymore – freeCodeCamp.org

joshnh/Git-Commands: A list of commonly used Git commands

The beginner’s guide to contributing to a GitHub project – Rob Allen's DevNotes

Understanding the GitHub Flow · GitHub Guides

Interesting Articles

Towards an anti-fascist AI (from opendemocracy.net)

Becoming a Level 3.0 Data Scientist

The Third-wave of Data Scientist

46 Most Intellectually Stimulating Sites That Will Spark Your Inner Genius in 10 Minutes a Day

Artificial Intelligence Learns to Learn Entirely on Its Own | Quanta Magazine

Edward Witten Ponders the Nature of Reality | Quanta Magazine

Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department | Stitch Fix Technology – Multithreaded

Foundations Built for a General Theory of Neural Networks - Quanta Magazine

General Thinking Tools: 9 Mental Models to Solve Difficult Problems

How Social Media Endangers Knowledge | WIRED

In These Small Cities, AI Advances Could Be Costly - MIT Technology Review

Machine Learning’s ‘Amazing’ Ability to Predict Chaos | Quanta Magazine

New Brain Maps With Unmatched Detail May Change Neuroscience | WIRED

Pedro Domingos on the Arms Race in Artificial Intelligence - SPIEGEL ONLINE

Quantum Leaps in Quantum Computing? - Scientific American

The Fragile State of the Midwest’s Public Universities - The Atlantic

The Future of Human Work Is Imagination, Creativity, and Strategy

The Quantum Thermodynamics Revolution | Quanta Magazine

What Is Code? | Paul Ford| Bloomberg

The Economics Of Artificial Intelligence - How Cheaper Predictions Will Change The World

OpenAI’s Dota 2 defeat is still a win for artificial intelligence  - The Verge

Machine Learning Confronts the Elephant in the Room | Quanta Magazine

MOOC related

Complete lecture notes of the Stanford/Coursera Machine Learning class by Andrew Ng

200 universities just launched 560 free online courses. Here’s the full list.

Artificial Intelligence | MIT OpenCourseWare

Dashboard | MIT Professional Education Digital Programs

Data Science A-Z™: Real-Life Data Science Exercises Included | Udemy

Data Science Essentials | edX

How to choose effective MOOCs for machine learning and data science?

I uncovered 1,150+ Coursera courses that are still completely free

Information and Entropy | MIT OpenCourseWare

Introduction to Algorithms | MIT OpenCourseWare

Introduction to Data Analysis using Excel | edX

Introduction to Python for Data Science | edX

Introduction to R for Data Science | edX

Mathematics for Computer Science | MIT OpenCourseWare

Programming with Python for Data Science!

Statistical Thinking for Data Science course

Top Data Science Online Courses in 2017 – LearnDataSci

U. Wash ML course Jupyter Home

SQL

A Visual Explanation of SQL Joins

Join (SQL) - Wikipedia

PostgreSQL: Mathematical Functions and Operators

PostgreSQL: String Functions and Operators

Psycopg2 Tutorial - PostgreSQL with Python

SQL Joins Explained

The SQL Tutorial for Data Analysis | SQL Tutorial - Mode Analytics

SQL vs NoSQL or MySQL vs MongoDB - YouTube

Thinking in SQL vs Thinking in Python

Kaggle SQL course (including BigQuery topics)

Statistics

Common statistical tests are linear models (or: how to teach stats)

Introductory statistics - OpenText Library

Common statistical tests are linear models (or: how to teach stats)

Background: Markov chains

OpenIntro Stats

Regression Analysis Tutorial and Examples | Minitab

The 10 Statistical Techniques Data Scientists Need to Master

The Ultimate Guide to 12 Dimensionality Reduction Techniques (with Python codes)

Thomas Bayes and the crisis in science – TheTLS

Welcome to STAT 505! | STAT 505

Introduction to Bayesian Linear Regression – Towards Data Science

Regression Analysis Tutorial and Examples | Minitab

The 10 Statistical Techniques Data Scientists Need to Master

Welcome to STAT 505! | STAT 505

Probability and Statistics Visually

Visualizations (and image processing related)

The paper describing Scikit-image from its core developers

Full-screen interactive that lets you explore the first 300 years of Data Visualization

designing-great-visualizations.pdf

Gallery of Data Visualization - Missed Opportunities and Graphical Failures

Lesson 1-4, first visualization data - Govind Acharya | Tableau Public

Mapping the 1854 Cholera Outbreak | Tableau Public

Resources | Tableau Public

10 Free Must-Read Books for Machine Learning and Data Science

60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more

Data Skeptic

GGobi data visualization system.

GitHub (Tirthajyoti Sarkar)

Here Are (Approximately) 3000 Free Data Sources You Can Use Right Now

If you want to learn Data Science, take a few of these statistics classes

Learn to code | Codecademy

Lecture Notes | Introduction to MATLAB | Electrical Engineering and Computer Science | MIT OpenCourseWare

Medium – Read, write and share stories that matter

Scientifically Sound

Top 28 Cheat Sheets for Machine Learning, Data Science, Probability, SQL & Big Data

Learn Data Science - Infographic (article) - DataCamp

Neural Network

Videos

Deep blueberry

Brandon Rohrer - Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

CS231n Lecture 10 - Recurrent Neural Networks, Image Captioning, LSTM - YouTube

Nuts and Bolts of Applying Deep Learning (Andrew Ng) - YouTube

Siraj Raval - LSTM Networks - The Math of Intelligence (Week 8) - YouTube

Siraj Raval - Recurrent Neural Networks - The Math of Intelligence (Week 5) - YouTube

Andrew Ng: Artificial Intelligence is the New Electricity - YouTube

A Neural Network Playground

But what is a Neural Network? | Deep learning, chapter 1

Convolutional Networks in Java - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

CS231n Convolutional Neural Networks for Visual Recognition

Deep Learning Fundamentals - Cognitive Class

Exploring LSTMs

Feature Visualization

Neural networks and deep learning

Understanding Hinton’s Capsule Networks. Part I: Intuition.

Understanding LSTM Networks -- colah's blog

The Unreasonable Effectiveness of Recurrent Neural Networks

Andrej Carpathy blog - Hacker's guide to Neural Networks

A Beginner's Guide to Recurrent Networks and LSTMs - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

J Alammar – Explorations in touchable pixels and intelligent androids

Keras

Guide to the Sequential model - Keras Documentation

Keras Documentation

How to Use Word Embedding Layers for Deep Learning with Keras - Machine Learning Mastery

TensorFlow

Building Input Functions with tf.estimator  |  TensorFlow

Getting Started With TensorFlow  |  TensorFlow

Installing TensorFlow on Windows  |  TensorFlow

TensorFlow

TensorFlow Linear Model Tutorial  |  TensorFlow

TensorFlow Wide & Deep Learning Tutorial  |  TensorFlow

Using TensorFlow in Windows with a GPU | Heaton Research

Installation Guide Windows :: CUDA Toolkit Documentation

7 Steps to Mastering Machine Learning With Python

A visual introduction to machine learning

Berkeley AI Materials

Deep Learning For Coders fast.ai

Lecture Collection | Machine Learning - Stanford course

Microsoft Azure ML Cheat sheet

Pedro Domigos Machine Learning lectures

The Hitchhiker’s Guide to Machine Learning in Python

Top 10 Machine Learning Projects on Github

UCI Machine Learning Repository

[ISLR class videos](https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/

Machine Learning Zero-to-Hero: Everything you need in order to compete on Kaggle for the first…

GOOGLE - Rules of Machine Learning:  |  Machine Learning Rules  |  Google Developers

PySpark ML tutorial example

Python Generators Tutorial

R Markdown: The Definitive Guide

Understanding the GitHub Flow · GitHub Guides

How to Prepare for a Machine Learning Interview - Semantic Bits

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

AI Knowledge Map: How To Classify AI Technologies

Apache Spark

Building A Linear Regression with PySpark and MLlib

Complete Guide on DataFrame Operations in PySpark

Install_Spark_on_Windows10.pdf

Introduction · Mastering Apache Spark

MLlib: Main Guide - Spark 2.3.1 Documentation

Overview - Spark 2.3.1 Documentation

RDD Programming Guide - Spark 2.3.1 Documentation

rdflib 5.0.0-dev — rdflib 5.0.0-dev documentation

Spark SQL and DataFrames - Spark 2.3.1 Documentation

Welcome to Spark Python API Docs! — PySpark 2.3.1 documentation

Cloud computing

Why You Should Consider Google AI Platform For Your Machine Learning Projects

Cloud Computing Tutorial for Beginners | Cloud Computing Explained | Cloud Computing | Simplilearn - YouTube

Computation, Computing

A Short Guide to Hard Problems | Quanta Magazine

Data Mining

The 10 Mining Techniques Data Scientists Need for Their Toolbox

Wikipedia Data Science: Working with the World’s Largest Encyclopedia

Data wrangling related

A Brief Overview of Outlier Detection Techniques – Towards Data Science

Docker, Containers

A Beginner-Friendly Introduction to Containers, VMs and Docker

A fast and easy Docker tutorial for beginners (video series)

Docker Compose in 12 Minutes - YouTube

How to Install and Use Docker on Ubuntu 18.04 | DigitalOcean

How to Install Docker On Ubuntu 18.04 Bionic Beaver - LinuxConfig.org

Learn Docker in 12 Minutes 🐳 - YouTube

What is a Container? - YouTube

What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps Tools | Edureka - YouTube

Building Your Own Data Science Platform With Python & Docker - YouTube

Interview related

50+ Data Structure and Algorithms Interview Questions for Programmers

Web Technologies

REST, API, Microservice

GraphQL vs. REST – Apollo GraphQL

Microservices, APIs, and Swagger: How They Fit Together | Swagger

REST API concepts and examples - YouTube

Web Architecture 101 – VideoBlocks Product & Engineering

REST API & RESTful Web Services Explained - YouTube

Our Collections – Towards Data Science

JSON, XML, HTML

JSON Crash Course - YouTube Can I use... Support tables for HTML5, CSS3, etc HTML5 Form Validation Examples < HTML | The Art of Web

CSS

The CSS Handbook: a handy guide to CSS for developers

Creating a Simple Website with HTML and CSS - Part 1 - YouTube

CSS Introduction - W3Schools

Learn CSS in 12 Minutes - YouTube

JavaScript

Beginner JavaScript Tutorial - 1 - Introduction to JavaScript - YouTube

Eloquent JavaScript

Form Validation with JavaScript - Check for an Empty Text Field - YouTube

JavaScript Basics Part 1

JavaScript beginner tutorial 30 - form validation text boxes and passwords - YouTube

JavaScript: Simple Form Validation - YouTube

Learn JavaScript in 12 Minutes - YouTube

Machine Learning with JavaScript : Part 1 – Hacker Noon

Machine Learning with JavaScript : Part 2 – Hacker Noon

W3School - JavaScript Form Validation

W3schools - JavaScript Tutorial

ClearlyDecoded.com - Yaakov Chaikin

GoDaddy Hosting Account Getting Started Guide

How to Make A Website in 2018 - Web Hosting Guide | WHSR

jhu-ep-coursera/fullstack-course4: Example code for HTML, CSS, and Javascript for Web Developers Coursera Course

LaTeX, Markdown, reST

Art of Problem Solving - LaTeX symbols

Detexify LaTeX handwritten symbol recognition

http://quicklatex.com/

LaTeX symbol Wiki

The Comprehensive LaTeX Symbol ListThe Comprehensive LaTeX Symbol List - symbols-a4.pdf

Pandoc - Pandoc User’s Guide

MathJax Documentation — MathJax 2.7 documentation

TeX Commands available in MathJax

Linux, OS

How to Install Ubuntu Linux on VirtualBox on Windows 10 [Step by Step Guide] | It's FOSS

Microsoft PowerShell Tutorial & Training Course – Microsoft Virtual Academy

Most Popular Linux Distributions and Why They Dominate the Market

The Dead-Simple Guide to Installing a Linux Virtual Machine on Windows - StorageCraft Technology Corporation

[Solved] Could not get lock /var/lib/dpkg/lock Error in Ubuntu | It's FOSS

Time series

Time Series Analysis in Python: An Introduction – Towards Data Science

RJT1990/pyflux: Open source time series library for Python

MaxBenChrist/awesome_time_series_in_python: This curated list contains python packages for time series analysis

Getting Started with Time Series — PyFlux 0.4.7 documentation

Introduction to ARIMA models

Complete guide to create a Time Series Forecast (with Codes in Python)

How to Create an ARIMA Model for Time Series Forecasting with Python

Time series with Siraj course by Kaggle

Interesting Articles

Debunking The Myths And Reality Of Artificial Intelligence - Forbes

Artificial Intelligence — The Revolution Hasn’t Happened Yet

Artificial Intelligence Learns to Learn Entirely on Its Own | Quanta Magazine

Can Buddhist philosophy explain what came before the Big Bang? | Aeon Essays

Coming to Grips with the Implications of Quantum Mechanics - Scientific American Blog Network

Did Toolmaking Pave the Road for Human Language? - The Atlantic

Edward Witten Ponders the Nature of Reality | Quanta Magazine

Gatekeeping and Elitism in Data Science

How Do Aliens Solve Climate Change? - The Atlantic

How I Learned to Stop Worrying About the LHC’s Missing New Physics

How Information Got Re-Invented – Limits – Medium

How Social Media Endangers Knowledge | WIRED

In These Small Cities, AI Advances Could Be Costly - MIT Technology Review

Inside Amazon’s $3.5 million competition to make Alexa chat like a human - The Verge

Let’s make private data into a public good - MIT Technology Review

On Chomsky and the Two Cultures of Statistical Learning

Quantum Leaps in Quantum Computing? - Scientific American

Strategy vs. Tactics: What's the Difference and Why Does it Matter?

The case for genetically engineering a smarter human-cyborg population to avoid the threat of existential catastrophe.

The Fragile State of the Midwest’s Public Universities - The Atlantic

The Quantum Thermodynamics Revolution | Quanta Magazine

The Way You Read Books Says A Lot About Your Intelligence, Here’s Why

To Build Truly Intelligent Machines, Teach Them Cause and Effect | Quanta Magazine

Why Is American Mass Transit So Bad? It's a Long Story. - CityLab

Yuval Noah Harari on what 2050 has in store for humankind | WIRED UK

Yuval Noah Harari on Why Technology Favors Tyranny - The Atlantic

Yuval Noah Harari: ‘The idea of free information is extremely dangerous’ | Culture | The Guardian

Beyond Weird: Decoherence, Quantum Weirdness, and Schrödinger's Cat - The Atlantic

Life Is a Braid in Spacetime – Time – Medium

Mental Models: How to Train Your Brain to Think in New Ways - James Clear - Pocket

Don’t Compete. Create! - Darius Foroux - Pocket

Tesla will live and die by the Gigafactory - The Verge

So you want to be a Research Scientist – Vincent Vanhoucke – Medium

Homeland Security Will Let Software Flag Potential Terrorists

What Happens When a World Order Ends

Kevin Slavin: How algorithms shape our world | TED Talk

The Brain's Autopilot Mechanism Steers Consciousness - Scientific American

What is Intelligence? – Towards Data Science

This Is Exactly How You Should Train Yourself To Be Smarter - Michael Simmons - Pocket

How to be More Productive and Eliminate Time Wasting Activities by Using the “Eisenhower Box” - James Clear - Pocket

The blind spot of science is the neglect of lived experience | Aeon Essays

Julia

A Complete Tutorial to Learn Data Science with Julia from Scratch

Machine Learning

Experiment tracking

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Fairness and bias

Evaluating machine learning models for fairness and bias

Deployment of ML

Creating data science APIs with Flask

Flask and Heroku for online Machine Learning deployment

Overview of the different approaches to putting Machine Learning (ML) models in production

[Guide] Building Data Science Web Application with React, NodeJS, and MySQL

A beginner’s guide to training and deploying machine learning models using Python

A Guide to Scaling Machine Learning Models in Production

Deploying Keras Deep Learning Models with Flask – Towards Data Science

Deploying Machine Learning at Scale - Algorithmia Blog

Deploying Machine Learning has never been so easy – Towards Data Science

Quora - How do you take a machine learning model to production?

Tutorial to deploy Machine Learning model in Production as API with Flask

From Big Data to micro-services: how to serve Spark-trained models through AWS lambdas

How to deliver on Machine Learning projects – Insight Data

Deploying a Keras Deep Learning Model as a Web Application in P

Genetic Algorithm

Genetic Algorithm Implementation in Python – Towards Data Science

Introduction to Optimization with Genetic Algorithm

A tutorial on Differential Evolution with Python · Pablo R. Mier

Keras

Guide to the Sequential model - Keras Documentation

Keras Documentation

How to Use Word Embedding Layers for Deep Learning with Keras - Machine Learning Mastery

Neural Network

Videos

Brandon Rohrer - Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

CS231n Lecture 10 - Recurrent Neural Networks, Image Captioning, LSTM - YouTube

Nuts and Bolts of Applying Deep Learning (Andrew Ng) - YouTube

Siraj Raval - LSTM Networks - The Math of Intelligence (Week 8) - YouTube

Siraj Raval - Recurrent Neural Networks - The Math of Intelligence (Week 5) - YouTube

Andrew Ng: Artificial Intelligence is the New Electricity - YouTube

A Beginner's Guide to Recurrent Networks and LSTMs - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

A Neural Network Playground

A Visual Guide to Evolution Strategies

Andrej Carpathy blog - Hacker's guide to Neural Networks

Best (and Free!!) Resources to understand Nuts and Bolts of Deep learning

But what is a Neural Network? | Deep learning, chapter 1

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Convolutional Networks in Java - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

CS231n Convolutional Neural Networks for Visual Recognition

Deep Dive into Math Behind Deep Networks – Towards Data Science

Deep Learning Fundamentals - Cognitive Class

Exploring LSTMs

Feature Visualization

J Alammar – Explorations in touchable pixels and intelligent androids

Learning without Backpropagation: Intuition and Ideas (Part 1) – Tom Breloff

Must know Information Theory concepts in Deep Learning (AI)

Neural networks and deep learning

Neural Style Transfer: Creating Art with Deep Learning using tf.keras and eager execution

The Unreasonable Effectiveness of Recurrent Neural Networks

Understanding Hinton’s Capsule Networks. Part I: Intuition.

Understanding LSTM Networks -- colah's blog

A Neural Network in 13 lines of Python (Part 2 - Gradient Descent) - i am trask

How Do Artificial Neural Networks Learn? – Towards Data Science

The Neural Network Zoo - The Asimov Institute

A History of Deep Learning | Import.io

The Ultimate NanoBook to understand Deep Learning based Image Classifier

NLP

How to solve 90% of NLP problems: a step-by-step guide

Coding & English Lit: Natural Language Processing in Python

TextBlob: Simplified Text Processing — TextBlob 0.15.1 documentation

Python Regular Expression Tutorial (article) - DataCamp

Stanford NLP

Reinforcement Learning

Reinforcement Learning Course - Full Machine Learning Tutorial

A brief introduction to reinforcement learning – freeCodeCamp.org

An introduction to Reinforcement Learning – freeCodeCamp.org

Key Papers in Deep RL — Spinning Up documentation

Nuts & Bolts of Reinforcement Learning: Model Based Planning using Dynamic Programming

Reinforcement Learning: A Deep Dive | Toptal

Part 1: Key Concepts in RL — Spinning Up documentation

Dissecting Reinforcement Learning-Part.1

Reinforcement Q-Learning from Scratch in Python with OpenAI Gym – LearnDataSci

Google AI Blog: Curiosity and Procrastination in Reinforcement Learning

Reinforcement Learning: Monte Carlo Learning using OpenAI Gym

TensorFlow

Building Input Functions with tf.estimator  |  TensorFlow

Getting Started With TensorFlow  |  TensorFlow

Installing TensorFlow on Windows  |  TensorFlow

TensorFlow

TensorFlow Linear Model Tutorial  |  TensorFlow

TensorFlow Wide & Deep Learning Tutorial  |  TensorFlow

Using TensorFlow in Windows with a GPU | Heaton Research

Installation Guide Windows :: CUDA Toolkit Documentation

7 Steps to Mastering Machine Learning With Python

A visual introduction to machine learning

Approaching (Almost) Any Machine Learning Problem | Abhishek Thakur | No Free Hunch

Automated Machine Learning Hyperparameter Tuning in Python

Berkeley AI Materials

Deep Learning For Coders fast.ai

Essentials of Machine Learning Algorithms (with Python and R Codes)

GOOGLE - Rules of Machine Learning:  |  Machine Learning Rules  |  Google Developers

http://www.r2d3.us/visual-intro-to-machine-learning-part-2/

ISLR class videos

Lecture Collection | Machine Learning - Stanford course

Machine Learning Zero-to-Hero: Everything you need in order to compete on Kaggle for the first…

Microsoft Azure ML Cheat sheet

Open Machine Learning Course (beta) • mlcourse.ai

Pedro Domigos Machine Learning lectures

The Hitchhiker’s Guide to Machine Learning in Python

Top 10 Machine Learning Projects on Github

UCI Machine Learning Repository

Optimization and ML

Learning to Optimize with Reinforcement Learning – The Berkeley Artificial Intelligence Research Blog

Kaggle

Hello Kaggle! - A Kaggle Guide for someone who is new at Kaggle

Python

Tutorials

Everything About Python — Beginner To Advanced

Jupyter and IDE related

Interactive spreadsheets in Jupyter

PyCharm for data scientists

Built-in magic commands — IPython 6.2.1 documentation

Concrete Statistics Jupyter Notebook Peter Norvig

Economics simulation Jupyter Notebook Peter Norvig

Markdown Cheatsheet

Using Interact — Jupyter Widgets 7.0.3 documentation

Pixie - visual Python debugger for Jupyter notebook

Matplotlib, Seaborn, Visualization

color example code: colormaps_reference.py — Matplotlib 2.0.2 documentation

ggplot | Home

Matplotlib 1.5.1

Matplotlib Plotting commands summary —

Matplotlib tutorial

Seaborn tutorial — seaborn 0.7.1 documentation

MOOC courses

Github/jmportilla/Complete-Python-Bootcamp: Lectures

Jupyter Notebook - Udemy Complete Python Bootcamp course

Python for Data Science and Machine Learning Bootcamp | Udemy

Computational Science and Engineering I | Mathematics | MIT OpenCourseWare

Foundations of Machine Learning (A course by Bloomberg)

NumPy and SciPy

Linear algebra (numpy.linalg) — NumPy v1.12 Manual

NumPy v1.12 Universal functions

NumPy v1.13.dev0 Manual

Random sampling (numpy.random) — NumPy v1.13 Manual

SciPy — SciPy v0.19.0 Reference Guide

From Python to Numpy

numpy-100/100 Numpy exercises with hint.md at master · rougier/numpy-100

Pandas

Pandas 0.20.3 documentation

Pandas: Python Data Analysis Library

Setup, PyPi, Creating your own packages

Home | Read the Docs

How to publish your own Python Package on PyPi – freeCodeCamp

Step-by-Step Guide to Creating R and Python Libraries (in JupyterLab)

How to submit a package to PyPI — Peter Downs

Packaging and Distributing Projects — Python Packaging User Guide

reStructuredText Primer — Sphinx 1.8.0+ documentation

Using TestPyPI — Python Packaging User Guide

How to open source your Python library | Opensource.com

Spark and AWS

Amazon Web Services (AWS) - Cloud Computing Services

Connecting to Your Linux Instance from Windows Using PuTTY - Amazon Elastic Compute Cloud

Install Spark on Windows (PySpark) – Michael Galarnyk – Medium

Projects

10 Steps to Set Up Your Python Project for Success

Tools and Utilities

itertools — Functions creating iterators for efficient looping — Python 3.6.3 documentation

Web Data Analytics

Processing XML in Python with ElementTree - Eli Bendersky's website

Using BeautifulSoup to parse HTML and extract press briefings URLs | Computational Journalism, Spring 2016

28 Jupyter Notebook tips, tricks and shortcuts

A curated list of awesome Python frameworks, libraries, software and resources

Archived Problems - Project Euler

Choosing the right estimator — scikit-learn 0.18.1 documentation

CodeSkulptor

CodeSkulptor

Installing XGBoost For Anaconda on Windows (IT Best Kept Secret Is Optimization)

Pandas 0.20.3 - API Reference

Pandas 0.20.3 Cookbook

PostgreSQL + Python | Psycopg

Problems - CodeAbbey

Project Jupyter | Home

PY4E - Python for Everybody

Python 2.7.13 documentation

Python Conquers The Universe | Adventures across space and time with the Python programming language

Python Flask From Scratch - YouTube

Python Tricks 101 – Hacker Noon

Python tutorial - TutorialsPoint

Regular Expressions for Data Scientists

Simple Linear Regression Analysis - ReliaWiki

Introduction — Python 101 1.0 documentation

Documenting Python Code: A Complete Guide – Real Python

MIT AI: Python (Guido van Rossum) - YouTube

Python IDEs and Code Editors (Guide) – Real Python

Advanced Python web scraping tricks and tips

R related

A Beginner’s Guide to Neural Networks with R

A Comprehensive Guide to Data Visualisation in R for Beginners

An R Introduction to Statistics | R Tutorial

Data Manipulation with dplyr | R-bloggers

Data Science and Machine Learning Bootcamp with R | Udemy

ggplot2-cheatsheet.pdf

Machine Learning A-Z™: Download Practice Datasets - SuperDataScience - Big Data | Analytics Careers | Mentors | Success

Quick-R: Home Page

R mailing lists archive

R Tutorial Series - Statistical Tests | Saranya Anandh | Pulse | LinkedIn

R: Control for Rpart Fits

R: Recursive Partitioning and Regression Trees

Short-refcard.pdf

Theme • ggplot2

COLLEGES

Intensive Programs

MOOC's

Tutorials

Free Courses

Toolboxes - Environment

Link Description
The Data Science Lifecycle Process The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo
Data Science Lifecycle Template Repo Template repository for data science lifecycle project
RexMex A general purpose recommender metrics library for fair evaluation.
ChemicalX A PyTorch based deep learning library for drug pair scoring.
PyTorch Geometric Temporal Representation learning on dynamic graphs.
Little Ball of Fur A graph sampling library for NetworkX with a Scikit-Learn like API.
Karate Club An unsupervised machine learning extension library for NetworkX with a Scikit-Learn like API.
ML Workspace All-in-one web-based IDE for machine learning and data science. The workspace is deployed as a Docker container and is preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch) and dev tools (e.g., Jupyter, VS Code)
Neptune.ai Community-friendly platform supporting data scientists in creating and sharing machine learning models. Neptune facilitates teamwork, infrastructure management, models comparison and reproducibility.
steppy Lightweight, Python library for fast and reproducible machine learning experimentation. Introduces very simple interface that enables clean machine learning pipeline design.
steppy-toolkit Curated collection of the neural networks, transformers and models that make your machine learning work faster and more effective.
Datalab from Google easily explore, visualize, analyze, and transform data using familiar languages, such as Python and SQL, interactively.
Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials.
R is a free software environment for statistical computing and graphics.
RStudio IDE – powerful user interface for R. It’s free and open source, works on Windows, Mac, and Linux.
Python - Pandas - Anaconda Completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing
Pandas GUI Pandas GUI
Scikit-Learn Machine Learning in Python
NumPy NumPy is fundamental for scientific computing with Python. It supports large, multi-dimensional arrays and matrices and includes an assortment of high-level mathematical functions to operate on these arrays.
Vaex Vaex is a Python library that allows you to visualize large datasets and calculate statistics at high speeds.
SciPy SciPy works with NumPy arrays and provides efficient routines for numerical integration and optimization.
Data Science Toolbox Coursera Course
Data Science Toolbox Blog
Wolfram Data Science Platform Take numerical, textual, image, GIS or other data and give it the Wolfram treatment, carrying out a full spectrum of data science analysis and visualization and automatically generating rich interactive reports—all powered by the revolutionary knowledge-based Wolfram Language.
Datadog Solutions, code, and devops for high-scale data science.
Variance Build powerful data visualizations for the web without writing JavaScript
Kite Development Kit The Kite Software Development Kit (Apache License, Version 2.0) , or Kite for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
Domino Data Labs Run, scale, share, and deploy your models — without any infrastructure or setup.
Apache Flink A platform for efficient, distributed, general-purpose data processing.
Apache Hama Apache Hama is an Apache Top-Level open source project, allowing you to do advanced analytics beyond MapReduce.
Weka Weka is a collection of machine learning algorithms for data mining tasks.
Octave GNU Octave is a high-level interpreted language, primarily intended for numerical computations.(Free Matlab)
Apache Spark Lightning-fast cluster computing
Hydrosphere Mist a service for exposing Apache Spark analytics jobs and machine learning models as realtime, batch or reactive web services.
Data Mechanics A data science and engineering platform making Apache Spark more developer-friendly and cost-effective.
Caffe Deep Learning Framework
Torch A SCIENTIFIC COMPUTING FRAMEWORK FOR LUAJIT
Nervana's python based Deep Learning Framework .
Skale High performance distributed data processing in NodeJS
Aerosolve A machine learning package built for humans.
Intel framework Intel® Deep Learning Framework
Datawrapper An open source data visualization platform helping everyone to create simple, correct and embeddable charts. Also at github.com
Tensor Flow TensorFlow is an Open Source Software Library for Machine Intelligence
Natural Language Toolkit An introductory yet powerful toolkit for natural language processing and classification
nlp-toolkit for node.js .
Julia high-level, high-performance dynamic programming language for technical computing
IJulia a Julia-language backend combined with the Jupyter interactive environment
Apache Zeppelin Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more
Featuretools An open source framework for automated feature engineering written in python
Optimus Cleansing, pre-processing, feature engineering, exploratory data analysis and easy ML with PySpark backend.
Albumentations А fast and framework agnostic image augmentation library that implements a diverse set of augmentation techniques. Supports classification, segmentation, detection out of the box. Was used to win a number of Deep Learning competitions at Kaggle, Topcoder and those that were a part of the CVPR workshops.
DVC An open-source data science version control system. It helps track, organize and make data science projects reproducible. In its very basic scenario it helps version control and share large data and model files.
Lambdo is a workflow engine which significantly simplifies data analysis by combining in one analysis pipeline (i) feature engineering and machine learning (ii) model training and prediction (iii) table population and column evaluation.
Feast A feature store for the management, discovery, and access of machine learning features. Feast provides a consistent view of feature data for both model training and model serving.
Polyaxon A platform for reproducible and scalable machine learning and deep learning.
LightTag Text Annotation Tool for teams
UBIAI Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling
Trains Auto-Magical Experiment Manager, Version Control & DevOps for AI
Hopsworks Open-source data-intensive machine learning platform with a feature store. Ingest and manage features for both online (MySQL Cluster) and offline (Apache Hive) access, train and serve models at scale.
MindsDB MindsDB is an Explainable AutoML framework for developers. With MindsDB you can build, train and use state of the art ML models in as simple as one line of code.
Lightwood A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with an objective to build predictive models with one line of code.
AWS Data Wrangler An open-source Python package that extends the power of Pandas library to AWS connecting DataFrames and AWS data related services (Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, etc).
Amazon Rekognition AWS Rekognition is a service that lets developers working with Amazon Web Services add image analysis to their applications. Catalog assets, automate workflows, and extract meaning from your media and applications.
Amazon Textract Automatically extract printed text, handwriting, and data from any document.
Amazon Lookout for Vision Spot product defects using computer vision to automate quality inspection.Identify missing product components, vehicle and structure damage, and irregularities for comprehensive quality control.
Amazon CodeGuru Automate code reviews and optimize application performance with ML-powered recommendations.
CML An open source toolkit for using continuous integration in data science projects. Automatically train and test models in production-like environments with GitHub Actions & GitLab CI, and autogenerate visual reports on pull/merge requests.
Dask An open source Python library to painlessly transition your analytics code to distributed computing systems (Big Data)
Statsmodels A Python-based inferential statistics, hypothesis testing and regression framework
Gensim An open-source library for topic modeling of natural language text
spaCy A performant natural language processing toolkit
Grid Studio Grid studio is a web-based spreadsheet application with full integration of the Python programming language.
Python Data Science Handbook Python Data Science Handbook: full text in Jupyter Notebooks
Shapley A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
DAGsHub A platform built on open source tools for data, model and pipeline management.
Deepnote A new kind of data science notebook. Jupyter-compatible, with real-time collaboration and running in the cloud.
Valohai An MLOps platform that handles machine orchestration, automatic reproducibility and deployment.
PyMC3 A Python Library for Probabalistic Programming (Bayesian Inference and Machine Learning)
PyStan Python interface to Stan (Bayesian inference and modeling)
hmmlearn Unsupervised learning and inference of Hidden Markov Models
Chaos Genius ML powered analytics engine for outlier/anomaly detection and root cause analysis
Nimblebox A full-stack MLOps platform designed to help data scientists and machine learning practitioners around the world discover, create, and launch multi-cloud apps from their web browser.

Machine Learning in General Purpose

Deep Learning

pytorch

tensorflow

keras

Visualization Tools - Environments

Journals, Publications and Magazines

Presentations

Podcasts

Books

Socialize

Bloggers

Facebook Accounts

Twitter Accounts

Twitter Description
Big Data Combine Rapid-fire, live tryouts for data scientists seeking to monetize their models as trading strategies
Big Data Mania Data Viz Wiz , Data Journalist , Growth Hacker , Author of Data Science for Dummies (2015)
Big Data Science Big Data, Data Science, Predictive Modeling, Business Analytics, Hadoop, Decision and Operations Research.
Charlie Greenbacker Director of Data Science at @ExploreAltamira
Chris Said Data scientist at Twitter
Clare Corthell Dev, Design, Data Science @mattermark #hackerei
DADI Charles-Abner #datascientist @Ekimetrics. , #machinelearning #dataviz #DynamicCharts #Hadoop #R #Python #NLP #Bitcoin #dataenthousiast
Data Science Central Data Science Central is the industry's single resource for Big Data practitioners.
Data Science London Data Science. Big Data. Data Hacks. Data Junkies. Data Startups. Open Data
Data Science Renee Documenting my path from SQL Data Analyst pursuing an Engineering Master's Degree to Data Scientist
Data Science Report Mission is to help guide & advance careers in Data Science & Analytics
Data Science Tips Tips and Tricks for Data Scientists around the world! #datascience #bigdata
Data Vizzard DataViz, Security, Military
DataScienceX
deeplearning4j
DJ Patil White House Data Chief, VP @ RelateIQ.
Domino Data Lab
Drew Conway Data nerd, hacker, student of conflict.
Emilio Ferrara #Networks, #MachineLearning and #DataScience. I work on #Social Media. Postdoc at @IndianaUniv
Erin Bartolo Running with #BigData--enjoying a love/hate relationship with its hype. @iSchoolSU #DataScience Program Mgr.
Greg Reda Working @ GrubHub about data and pandas
Gregory Piatetsky KDnuggets President, Analytics/Big Data/Data Mining/Data Science expert, KDD & SIGKDD co-founder, was Chief Scientist at 2 startups, part-time philosopher.
Hadley Wickham Chief Scientist at RStudio, and an Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University.
Hakan Kardas Data Scientist
Hilary Mason Data Scientist in Residence at @accel.
Jeff Hammerbacher ReTweeting about data science
John Myles White Scientist at Facebook and Julia developer. Author of Machine Learning for Hackers and Bandit Algorithms for Website Optimization. Tweets reflect my views only.
Juan Miguel Lavista Principal Data Scientist @ Microsoft Data Science Team
Julia Evans Hacker - Pandas - Data Analyze
Kenneth Cukier The Economist's Data Editor and co-author of Big Data (http://www.big-data-book.com/).
Kevin Davenport Organizer of https://www.meetup.com/San-Diego-Data-Science-R-Users-Group/
Kevin Markham Data science instructor, and founder of Data School
Kim Rees Interactive data visualization and tools. Data flaneur.
Kirk Borne DataScientist, PhD Astrophysicist, Top #BigData Influencer.
Linda Regber Data story teller, visualizations.
Luis Rei PhD Student. Programming, Mobile, Web. Artificial Intelligence, Intelligent Robotics Machine Learning, Data Mining, Natural Language Processing, Data Science.
Mark Stevenson Data Analytics Recruitment Specialist at Salt (@SaltJobs) Analytics - Insight - Big Data - Datascience
Matt Harrison Opinions of full-stack Python guy, author, instructor, currently playing Data Scientist. Occasional fathering, husbanding, organic gardening.
Matthew Russell Mining the Social Web.
Mert Nuhoğlu Data Scientist at BizQualify, Developer
Monica Rogati Data @ Jawbone. Turned data into stories & products at LinkedIn. Text mining, applied machine learning, recommender systems. Ex-gamer, ex-machine coder; namer.
Noah Iliinsky Visualization & interaction designer. Practical cyclist. Author of vis books: https://www.oreilly.com/pub/au/4419
Paul Miller Cloud Computing/ Big Data/ Open Data Analyst & Consultant. Writer, Speaker & Moderator. Gigaom Research Analyst.
Peter Skomoroch Creating intelligent systems to automate tasks & improve decisions. Entrepreneur, ex Principal Data Scientist @LinkedIn. Machine Learning, ProductRei, Networks
Prash Chan Solution Architect @ IBM, Master Data Management, Data Quality & Data Governance Blogger. Data Science, Hadoop, Big Data & Cloud.
Quora Data Science Quora's data science topic
R-Bloggers Tweet blog posts from the R blogosphere, data science conferences and (!) open jobs for data scientists.
Rand Hindi
Randy Olson Computer scientist researching artificial intelligence. Data tinkerer. Community leader for @DataIsBeautiful. #OpenScience advocate.
Recep Erol Data Science geek @ UALR
Ryan Orban Data scientist, genetic origamist, hardware aficionado
Sean J. Taylor Social Scientist. Hacker. Facebook Data Science Team. Keywords: Experiments, Causal Inference, Statistics, Machine Learning, Economics.
Silvia K. Spiva #DataScience at Cisco
Harsh B. Gupta Data Scientist at BBVA Compass
Spencer Nelson Data nerd
Talha Oz Enjoys ABM, SNA, DM, ML, NLP, HI, Python, Java. Top percentile kaggler/data scientist
Tasos Skarlatidis Complex Event Processing, Big Data, Artificial Intelligence and Machine Learning. Passionate about programming and open-source.
Terry Timko InfoGov; Bigdata; Data as a Service; Data Science; Open, Social & Business Data Convergence
Tony Baer IT analyst with Ovum covering Big Data & data management with some systems engineering thrown in.
Tony Ojeda Data Scientist , Author , Entrepreneur. Co-founder @DataCommunityDC. Founder @DistrictDataLab. #DataScience #BigData #DataDC
Vamshi Ambati Data Science @ PayPal. #NLP, #machinelearning; PhD, Carnegie Mellon alumni (Blog: https://allthingsds.wordpress.com )
Wes McKinney Pandas (Python Data Analysis library).
WileyEd Senior Manager - @Seagate Big Data Analytics @McKinsey Alum #BigData + #Analytics Evangelist #Hadoop, #Cloud, #Digital, & #R Enthusiast
WNYC Data News Team The data news crew at @WNYC. Practicing data-driven journalism, making it visual and showing our work.
Alexey Grigorev Data science author

Newsletters

Youtube Videos & Channels

Telegram Channels

  • Open Data Science – First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former.
  • Loss function porn — Beautiful posts on DS/ML theme with video or graphic vizualization.
  • Machinelearning – Daily ML news.

Slack Communities

Github Groups

Competitions

Some data mining competition platforms

Fun

Infographic

Preview Description
Key differences of a data scientist vs. data engineer
A visual guide to Becoming a Data Scientist in 8 Steps by DataCamp (img)
Mindmap on required skills (img)
Swami Chandrasekaran made a Curriculum via Metro map.
by @kzawadz via twitter
By Data Science Central
Data Science Wars: R vs Python
How to select statistical or machine learning techniques
Choosing the Right Estimator
The Data Science Industry: Who Does What
Data Science Venn Euler Diagram
Different Data Science Skills and Roles from this article by Springboard
Data Fallacies To Avoid A simple and friendly way of teaching your non-data scientist/non-statistician colleagues how to avoid mistakes with data. From Geckoboard's Data Literacy Lessons.

Data Sets

Comics

Other Lists

Skill Tracks

Data Scientist with Python

Course Slides Dataset Notes Solutions
Introduction to Python - - - -
Intermediate Python - - - -
PROJECT TV, Halftime Shows, and the Big Game - - - -
Data Manipulation with pandas - - - -
PROJECT The Android App Market on Google Play - - - -
Merging DataFrames with pandas - - - -
PROJECT The GitHub History of the Scala Language - - - -
Introduction to Data Visualization with Matplotlib - - - -
Introduction to Data Visualization with Seaborn - - - -
Python Data Science Toolbox (Part 1) - - link -
Python Data Science Toolbox (Part 2) - - - -
Intermediate Data Visualization with Seaborn - - - -
PROJECT A Visual History of Nobel Prize Winners - - - -
Introduction to Importing Data in Python - - - -
Intermediate Importing Data in Python - - - -
Importing & Cleaning Data with Python - - - -
Cleaning Data in Python - - - -
Working with Dates and Times in Python - - - -
Writing Functions in Python - - - -
Exploratory Data Analysis in Python - - - -
Analyzing Police Activity with pandas - - - -
Statistical Thinking in Python (Part 1) - - - -
Statistical Thinking in Python (Part 2) - - - -
PROJECT Dr. Semmelweis and the Discovery of Handwashing - - - -
Supervised Learning with scikit-learn - - - -
PROJECT Predicting Credit Card Approvals - - - -
Unsupervised Learning in Python - - - -
Machine Learning with Tree-Based Models in Python - - - -
Case Study: School Budgeting with Machine Learning in Python - - - -
Cluster Analysis in Python - - - -

Data Analyst with Python

Course Slides Dataset Notes Solutions
Introduction to Data Science in Python - - - -
Intermediate Python - - - -
Data Manipulation with pandas - - - -
Merging DataFrames with pandas - - - -
Introduction to Data Visualization with Matplotlib - - - -
Introduction to Data Visualization with Seaborn - - - -
Introduction to Importing Data in Python - - - -
Intermediate Importing Data in Python - - - -
Cleaning Data in Python - - - -
Exploratory Data Analysis in Python - - - -
Analyzing Police Activity with pandas - - - -
Introduction to SQL - - - -
Streamlined Data Ingestion with pandas - - - -
Introduction to Relational Databases in SQL - - - -
Joining Data in SQL - - - -
Introduction to Databases in Python - - - -

Data Analyst with SQL Server

Course Solutions
Introduction to SQL Server link
Introduction to Relational Databases in SQL link
Intermediate SQL Server link
Time Series Analysis in SQL Server -
Functions for Manipulating Data in SQL Server -
Database Design link
Hierarchical and Recursive Queries in SQL Server -
Transactions and Error Handling in SQL Server -
Writing Functions and Stored Procedures in SQL Server -
Building and Optimizing Triggers in SQL Server link
Improving Query Performance in SQL Server -

Data Science for Everyone

Course Slides Dataset Notes Solutions
Introduction to Python - - - -
Intermediate Python - - - -
Python Data Science Toolbox (Part 1) - - - -
Python Data Science Toolbox (Part 2) - - - -
Introduction to Importing Data in Python - - - -
Intermediate Importing Data in Python - - - -
Cleaning Data in Python - - - -
Data Manipulation with pandas - - - -
Merging DataFrames with pandas - - - -
Analyzing Police Activity with pandas - - - -
Introduction to SQL - - - -
Introduction to Relational Databases in SQL - - - -
Introduction to Data Visualization with Matplotlib - - - -
Introduction to Data Visualization with Seaborn - - - -
Statistical Thinking in Python (Part 1) - - - -
Statistical Thinking in Python (Part 2) - - - -
Joining Data in SQL - - - -
Introduction to Shell - - - -
Conda Essentials - - - -
Supervised Learning with scikit-learn - - - -
Case Study: School Budgeting with Machine Learning in Python - - - -
Unsupervised Learning in Python - - - -
Machine Learning with Tree-Based Models in Python - - - -
Introduction to Deep Learning in Python - - - -
Introduction to Network Analysis in Python - - - -

Machine Learning Scientist with Python

Course Slides Dataset Notes Solutions
Machine Learning for Everyone - - - -
Introduction to Python - - - -
Intermediate Python - - - -
Python Data Science Toolbox (Part 1) - - - -
Python Data Science Toolbox (Part 2) - - - -
Statistical Thinking in Python (Part 1) - - - -
Supervised Learning with scikit-learn - - - -
Unsupervised Learning in Python - - - -
Linear Classifiers in Python - - - -
Machine Learning with Tree-Based Models in Python - - - -
Extreme Gradient Boosting with XGBoost - - - -
Cluster Analysis in Python - - - -
Dimensionality Reduction in Python - - - -
Preprocessing for Machine Learning in Python - - - -
Machine Learning for Time Series Data in Python - - - -
Feature Engineering for Machine Learning in Python - - - -
Model Validation in Python - - - -
Introduction to Natural Language Processing in Python - - - -
Feature Engineering for NLP in Python - - - -
Introduction to TensorFlow in Python - - - -
Introduction to Deep Learning in Python - - - -
Introduction to Deep Learning with Keras - - - -
Advanced Deep Learning with Keras - - - -
Image Processing in Python - - - -
Image Processing with Keras in Python - - - -
Hyperparameter Tuning in Python - - - -
Introduction to PySpark - - - -
Machine Learning with PySpark - - - -
Winning a Kaggle Competition in Python - - - -