Useful libraries for data visualization
Great overview / comparison of these libraries, at pbpython.com.
There are so many libraries and frameworks used in Python for data analysis that I had to take a step back and illustrate how they were laid out.
- PyDataSet
- Provides instant access to many datasets right from Python (in pandas DataFrame structure).
- NumyPy
- The fundamental package for scientific computing with Python. Fairly low level tool.
- Pandas
- Built on NumPy, and adds much more. Provides rich time series functionality, data alignment, NA-friendly statistics, groupby, merge and join methods, and lots of other conveniences.
- SciPy
- Collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data.
- Scikit-learn
- I ALWAYS USE THIS
- Module for machine learning built on top of SciPy
- Simple and efficient tools for data mining and data analysis
- Built on NumPy, SciPy, and matplotlib
- Scikit-learn
- Collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data.
- Matplotlib
- You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code.
- Seanborn
- Visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- Plot.ly
- Easily turn your data into eye-catching and informative graphics using our sophisticated, open source, visualization library and our online chart creation tool. Will not work in Azure Notebooks. Can be hosted in their cloud, or offline, but requires a large amount of data IO, which exceeds what the notebook can handle.
Useful Python Tools for ML
These will allow you to get data science and machine learning tools running on your local machine, browser, or even the cloud. I use all three of these in my day-to-day work.
- Azure ML Workbench (Local)
- Integrated, end-to-end data science and advanced analytics solution. It helps professional data scientists to prepare data, develop experiments, and deploy models at cloud scale.
- Enthought Canopy (Local)
- scientific and analytic Python package distribution plus key integrated tools for iterative data analysis, data visualization, and application development.
- Azure Notebooks (Browser)
- Do all of your ML work from within a Jupyter (Python) notebook inside the browser. No setup / install. You can have public / private notebooks. Great way to share your work. Free.
- Azure data science virtual machine (VM )
- You can run this either as a Windows or Linux (Ubuntu) VM. Click some buttons and you are good to go. Just make sure you turn auto-shutdown on!
- The VM will have just about every tool you’d ever need to do data analysis or ml.
Udemy Courses
I get a lot of value from seeing how other people code, so I’ll watch these videos and code alongside them in an Azure Notebook. They are also very affordable, at around $12 per course. Typing in each line of code helps me remember. You can find all of it here.
Machine Learning
- Python for data science and machine learning bootcamp
- Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!
- Data science and machine learning with python hands on
- I found this one to be useful for the ML aspects, and got far more out of it after I completed the python for ds/ml course above first.
Deep Learning
- Zero to Deep Learning with Python and Keras
- Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano, so I use it in any Deep Learning work.
- Complete Guide to TensorFlow for Deep Learning with Python
- Also taught by Jose Portilla, who teaches the first course I listed above.
Books
Sometimes I prefer having a tangible item in my hand to highlight, take notes, and read on a plane. I’ve found these three books to be the most useful in my studies.
- Hands-On Machine Learning with Scikit-Learn and TensorFlow
- Now that you understand Python and the applicable libraries, you can actually use it for ML
- Python for Data Analysis
- Fantastic for learning Python and growing familiar with the libraries you’ll use in data analysis. It is from the creator of the Pandas framework.
- Python Data Science Handbook
- Great overviews of Juypter notebooks, NumPy, Pandas, Matplotlib, Scikit-learn
- Free lite version & code from the author here.
- Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
- So helpful for explaining the business use case for ML and data science to non-technical individuals
- Also helps boil down problems into data science terms, and realize if it really is an ML problem
Azure Notebooks
I mentioned these above, but for me the real value lies in samples and tutorials provided with them. The image below is how I progressed through them, broken down by difficulty. If you are brand new to the field of data science it would be a great place to start.
These are all available on the Azure Notebooks landing page.
Here is a .pdf of the order I would do them in, beginner -> advanced
Additional Resources
- Machine learning cheatsheet (.pdf)
- Extremely helpful resource to learn the basics and use as a reference
- Machine learning algorithm cheatsheet
- Machine learning basics with algorithm examples
- [YouTube] Data School courses for learning ML
-----------------------
@DaveVoyles


Pingback: Getting started with data science and machine learning in Python - Dave Voyles - IntelliNova