Ventures into Visualizations: Matplotlib

Jason Drummond
7 min readFeb 15, 2021

Today we will be venturing into the world of data visualizations, specifically focusing on the Matplotlib library. Visualizations are one of the more important aspects that any budding Data Scientist/Analyst must master. Building a compelling visualization from our data will help us relay specific insights that we have gained from our dataset and make our “data story” more interesting to non-technical viewers. In this blog we will introduce you to basic plotting techniques utilizing Matplotlib’s sub-package pyplot and go over different types of plots and customizations that we can make.

Pyplot basics

We can create a multitude of different types of visualization using Matplotlib and pyplot is the most common way of achieving this. Pyplot is a Matplotlib module that gives the user an interface to create graphics like that of MATLAB, which is where this open-source library draws inspiration from. In order to use Matplotlib we must first install it using pip with the following commands

python -m pip install -U pip
python -m pip install -U matplotlib

After this step has completed we have to import the pyplot package in order to be used, a common convention is to import this package aliased as plt. We will also have to add another command after we import as we will be utilizing a jupyter notebook for all of our examples, this command allows our graphs to be seen in our environment.

import matplotlib.pyplot as plt
%matplotlib inline

Now we can get started making awesome visualizations! First and foremost we will go over the most basic of plots that pyplot has to offer which is the line plot. All that we need to make this kind of chart is two lists of the same size, we will be using a simple list of x’s and y’s for this, and the function plt.plot(). We will pass our two lists as arguments in to the plot function, the first corresponding to our horizontal axis while the second corresponds to our vertical axis.

x = [1,2,3,4,5]
y = [1,2,3,4,5]
plt.plot(x,y)

After putting this into our notebook you may be thinking that you did something wrong but we need one more command in order to actually display our chart. That command is plt.show(), python will always wait for this function until displaying our charts as we may want to add some additional customizations before displaying the plot, which we will go over a bit later. Go ahead and add plt.show() into your notebook and you should see the following chart.

Simple line plot

Congratulations! You have just made your first chart utilizing Matplotlib, unfortunately this chart is very basic though and does not convey any useful information. We will show you how to make your plots convey more meaning later but first we will go over a few of the different types of plots pyplot has to offer.

Types of Plots in Pyplot

Scatter Plot

Scatter plots are a very common type of plot used in Data Science. They are particularly good for trying to see if there is any correlation between two variables. In order to create a scatter plot we simply replace plt.plot() with plt.scatter(), remember we still need to pass our lists as arguments into this function. Below we will give an example of a scatter plot

x = [1,2,3,4,5]
y = [1,2,3,4,5]
plt.scatter(x,y)plt.show()
simple scatter plot

Bar Plot

Bar plots are a great way to visualize any categorical data that you may be investigating. To create a bar plot we make use of the plt.bar() function that pyplot has to offer which will take two arguments, first our categorical variables and last the values associated with those variables. Below is an example of creating a bar plot in pyplot.

cats = ['A', 'B', 'C', 'D', 'E']
vals = [10, 20, 30, 40, 50]
plt.bar(cats, vals)plt.show()
simple bar plot

We can also make a horizontal bar plot by simply swapping plt.bar() with plt.barh() and keeping everything else the same. This can be useful if our data lends itself to this type of orientation.

simple horizontal bar plot

Histograms

Histograms allow us to see approximate distributions of our numerical data, many times we see these types of plots when working in probability or statistics. To construct a Histogram we will utilize the plt.hist() function that is a part of the pyplot subpackage. We will only need to pass one argument into this which will be a list of our numerical data, optionally we can also change the bins argument. Python will set the number of bins to 10 by default, we may want to change this depending on our specific set of data, if we have to few bins we may tend to oversimplify our data while having too many will overcomplicate our results. Below we will go over an example of a histogram with the help of the numpy package. We will use numpy to randomly pick 100 numbers in the range of 1 to 10, then plot our results first with a bin size of 5 then with a bin size of 20.

import numpy as npvals = np.random.randint(1,11, size=50)plt.hist(vals, bins = 5)plt.show()
Histogram with small bin size
Histogram with larger bin size

Look at how different these charts are simply by altering their bin size! Ultimately it will be up to you to choose the right bin size for your data in order to convey the meaning of your data the best.

Customization

Now that we have gone over the basic types of plots that Matplotlib has to offer we will dive in to customizing our charts. We can customize our charts in a number of different ways by changing colors, shapes, labels, our axes, as well as so many more. These customizations are reliant on the data that we have and the story that we want to tell with it while also conveying the most meaning.

Title and Labels

All of your graphs should incorporate a title as well as axes labels, without these your audience will have no idea the point that you are trying to convey (in other words your charts will be meaningless. I will now introduce three methods in Matplotlib that will allow you to add a title and labels to your chart these are, plt.xlabel(), plt.ylabel(), and plt.title(). Each of this will take a string value as an argument of which will be what you are trying to plot. We will provide an example below utilizing a scatter plot with a dataset of heights and weights, remember to add plt.show() after we make our customizations in order for them to be displayed.

plt.scatter(height, weight)plt.xlabel('Height (in.)')
plt.ylabel('Weight (lbs.)')
plt.title('Scatter plot of Weight vs. Height')plt.show()
Scatter plot with labels and title

Look how much more informative this plot is now that we know is being compared on each axis! Labels and Titles should be a common customization of all of your charts.

Ticks

In our above chart say we did not want to display numerical values on our axes and instead wanted to be more general using a short to tall and lighter to heavier scale. We can accomplish this by utilizing the the following methods, plt.xticks() and plt.yticks(). Both of these will take in two arguments, first a list of numerical values from your chart and second a list of labels that you would like to change these values to. We will add this customization to our chart from above by displaying different ticks on each axis.

plt.scatter(height, weight)plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Scatter plot of Weight vs. Height')plt.xticks([62, 68, 74], ['Short', 'Medium', 'Tall'])
plt.yticks([120, 160, 200], ['Lighter', 'Medium', 'Heavier'])
plt.show()
Scatter plot with tick customizations

Colors

The last customization that we will go over is altering colors in our charts. This is a fairly simple customization but can add so much more meaning to our charts, especially if we are utilizing different categories of values. For our purpose we will simply change the color of our data points from the above examples. In order to achieve this we will simply alter a default argument that is passed in the plt.scatter() method, much like we did with the bins argument when we were working with the histogram. To do this we need to pass in c as an argument and set it to a certain color, there are a number of colors available to us and I urge you to look into the Matplotlib documentation for all of your color options. Below we will simply change our data points from blue to green to give you an idead of what our code will look like.

plt.scatter(height, weight, c = 'g')plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Scatter plot of Weight vs. Height')plt.xticks([62, 68, 74], ['Short', 'Medium', 'Tall'])
plt.yticks([120, 160, 200], ['Lighter', 'Medium', 'Heavier'])
plt.show()
Color customization

Conclusion

Congratulations! You have successfully created your first visualizations utilizing the Matplotlib library. Creating visualizations is a major part of your Data Science journey which will allow you to convey the information that you have discovered in your data in a clear way to your audience. This was a basic overview of what this library has to offer and I urge you to dive deeper in to this library as it can have a major impact on your ability to cleary present data and ideas to just about anyone.

--

--