Learning Objectives
- Understand what matplotlib can be used for.
- Be able to plot data as a line chart, bar graph, pie chart, and so forth.
- Be able to create a legend and place the legend appropriately.
- Be able to scale a plot
Reference Material
- https://matplotlib.org/3.0.3/users/index.html
- https://matplotlib.org/3.0.3/tutorials/introductory/pyplot.html
- https://matplotlib.org/3.0.3/api/_as_gen/matplotlib.pyplot.html
- https://matplotlib.org/3.0.3/api/pyplot_summary.html#colors-in-matplotlib
Introduction
Matplotlib is a library package that allows you to display data, such as lists or numpy arrays, in graphical plots. There are many options available in this package, so many that we can only cover a few in this lecture. Since many data scientists and engineers use MATLAB, we will be looking at a specific module in matplotlib called pyplot. This is a function-oriented system to resemble MATLAB. The more powerful version is in matplotlib directly, but again, since most of you use MATLAB or will use MATLAB, that’s the side this lecture covers.
Getting Started
The main library we will be using for this course is called pyplot, which allows us to draw nice graphs to the screen to visualize data. I recommend starting with the tutorial, which is linked under Reference Material above.
To get started quickly, I am just going to show you some code so you can get a sense of how information is given to pyplot and the methods used to display it to the screen.
We start off by importing a sub-module called pyplot from the matplotlib package. This package isn’t included in some distributions, so you might have to install it. Most Python distributions allow you to do something like python -m pip install -U matplotlib
.
You will notice that most uses of pyplot abbreviate it as plt. This allows us to focus more on getting the plot rather than typing out pyplot over and over again.
import matplotlib.pyplot as plt import numpy as np
If you get an error after importing matplotlib, it might mean that it is not installed or installed correctly. Also, many functions of matplotlib require numpy to function properly.
Your First Plot
Now, we have the module in an object called plt. We can now start making plots. The main data we use are lists or numpy arrays. Inside of plt, we have plot, which allows us to send data to pyplot in order for it to sequence it on a graphical plot.
import matplotlib.pyplot as plt import numpy as np def main(): plt.plot([-1, 0, 1]) plt.ylabel("Sequence") plt.plot() plt.show() if __name__ == "__main__": main()
The code above produces the following output.
As you can see from above, plt.plot() will plot the data, whereas plt.show() actually displays it. Also notice that it displays it graphically. This might cause issues if your Python distribution is unable to display graphically.
You can see that with the code we wrote, the list we gave plot was [-1, 0, 1]. Since we didn’t specify the X-axis, pyplot interpreted the list as our Y-values and the index of it in the list as the X-values. That’s why X=0.00 is -1, since list[0] = -1 in our example.
We can specify two lists–one for the X-axis and one for the Y-axis, respectively, to make custom values for our plot.
import matplotlib.pyplot as plt import numpy as np def main(): plt.plot([1000, 1500, 2000], [-1, 0, 1]) plt.ylabel("Rise") plt.xlabel("Run") plt.plot() plt.show() if __name__ == "__main__": main()
Now that we specified the X axis, notice how the plot has changed.
Changing The Graph Type
We can style our graph by specifying a third option–which is the style of the graph. Pyplot uses a string to specify how we want to plot. There are several options, which are listed here: matplotlib.pyplot.plot — Matplotlib 3.3.4 documentation.
import matplotlib.pyplot as plt import numpy as np def main(): plt.plot([1000, 1500, 2000], [-1, 0, 1], "-pm") plt.ylabel("Rise") plt.xlabel("Run") plt.plot() plt.show() if __name__ == "__main__": main()
If you look at the documentation, our style is “-pm”, which is three specifiers: – means a solid line, p means to use pentagons as data markers, and m means to color it in magenta. This gives us the following plot.
Overlaying
We can also plot multiple plots on top of the same graph by specifying plot over and over again or all at the same time in one plot.
Here is an example of specifying the plot using one plot function call and in separate calls.
import matplotlib.pyplot as plt import numpy as np def main(): t = np.arange(0., 5., 0.2) # We can specify all the plots on one line. Every three arguments is a data point # t, t, 'r--' is the first line # t, np.power(t, 2) 'bs' is the second line # t, np.power(t, 3), 'g^' is the third line plt.plot(t, t, 'r--', t, np.power(t, 2), 'bs', t, np.power(t, 3), 'g^') # Or we can do this with three separate plots plt.plot(t, t, 'r--') plt.plot(t, np.power(t, 2), 'bs') plt.plot(t, np.power(t, 3), 'g^') plt.show() if __name__ == "__main__": main()
It’s up to you on whether you want three plot function calls or one. To me, having three plot function calls makes it much easier to discern what you’re trying to do. If you get off even by one parameter in the single plot function call, your entire graph will be ruined. This also shows the power of pyplot. Notice that instead of using a list, I used a numpy array to store the data. Numpy is a highly optimized mathematics library, so for most data sets, numpy is the preferred method.
The code above produces the following graph. The same graph will be generated using one plot function call or three. Again, this is all style.
You can also see that ‘r–‘ tells pyplot to generate a red segmented line, whereas ‘bs’ tells pyplot to generate a blue line using squares (bs), and ‘g^’ tells pyplot to use green triangles.
Charts
As you can see, the .plot() is used to make an XY-plot. However, what if we want a bar graph or a pie chart? These can be added by using special functions. A bar graph is made by using .bar() instead of .plot() and a pie chart can be created by using .pie().
import matplotlib.pyplot as plt import numpy as np def main(): t = np.array([10, 20, 40, 30]) pie_labels = ["A", "B", "C", "D"] plt.pie(t, labels=pie_labels) plt.show() if __name__ == "__main__": main()
The code above will produce a pie chart with wedges 10, 20, 40, and 30 and labels A, B, C, and D. The pie function automatically gives each wedge the weight of \(\frac{x}{\sum{x}}\). So, the 10 will get \(\frac{10}{100}\) of a pie wedge. Fractional numbers can also be considered, but it is possible to get “holes” in your pie chart. There are several other options available, such as specifying the colors of each wedge, which can be seen here: matplotlib.pyplot.pie — Matplotlib 3.3.4 documentation.
The code above produces the following pie chart.
Bar Graphs
We can make bar graphs by using .bar() instead of .pie() or instead of .plot().
import matplotlib.pyplot as plt import numpy as np def main(): # X x = np.array([1, 0, 2, 3, 4]) # Y y = np.array([5, 15, 15, 25, 7]) # Colors colors = ["b", "r", "m", "k", "g"] plt.bar(x, y, color=colors) plt.show() if __name__ == "__main__": main()
We first specify the X, which is the position the bar is going to be in ascending order. As you can see with the arrays above, the 15 will actually come first since it is given X index 0. Then the 5, then the other 15, followed by 27 and 7. We can also specify the colors of each bar by using a list of color characters, where b = blue, r = red, m = magenta, k = black, and g = green. The code above produces the following bar graph.
Sub-plots
Pyplot only displays one plot. However, we can add subplots to make pyplot display multiple plots within that one plot. We use subplot exactly like .plot() except we need to also specify a position where the subplot will go. The position is a number that contains three digits. These three digits are (in order): number of rows, number of columns, and index of the given subplot. For example, plt.subplot(221)
will generate a subplot that contains two rows and two columns and place this given subplot in index 1. Unlike Python, pyplot uses indices that start with 1.
import matplotlib.pyplot as plt import numpy as np def main(): def f(t): return np.exp(-t) * np.cos(2*np.pi*t) t1 = np.arange(0.0, 5.0, 0.1) t2 = np.arange(0.0, 5.0, 0.02) plt.figure() plt.subplot(221) plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k') plt.subplot(224) plt.plot(t2, np.cos(2*np.pi*t2), 'r--') plt.show() if __name__ == "__main__": main()
As you can see, the index is in row-major, meaning 1 is the first column of the first row, 2 is the second column of the first row, and so on. The code above produces the following output.
You can sort of see that this is a 2×2 grid. Since the blue graph is specified with index 1, it is in the upper-left corner, whereas the red graph is specified with index 4, which puts it in the lower-right corner.
Adding Text
We can add text to our plot/chart by using the .text() function. This allows us to put text at an arbitrary x,y coordinate in our chart.
mu, sigma = 100, 15 x = mu + sigma * np.random.randn(10000) # the histogram of the data n, bins, patches = plt.hist(x, 50, density=1, facecolor='g', alpha=0.75) plt.xlabel('Smarts') plt.ylabel('Probability') plt.title('Histogram of IQ') plt.text(60, .025, r'$\mu=100,\ \sigma=15$') plt.axis([40, 160, 0, 0.03]) plt.grid(True) plt.show()
You can see that the text allows for some mathematical symbols which closely resemble MathJAX, but are listed here: Writing mathematical expressions — Matplotlib 3.3.4 documentation. The code above produces the following graph.
Non-linear Scales
Pyplot also allows for non-linear scales by specifying one of four scales: linear, log, symmetric log, or logit. The following code demonstrates all four types.
# Fixing random state for reproducibility np.random.seed(19680801) # make up some data in the open interval (0, 1) y = np.random.normal(loc=0.5, scale=0.4, size=1000) y = y[(y > 0) & (y < 1)] y.sort() x = np.arange(len(y)) # plot with various axes scales plt.figure() # linear plt.subplot(221) plt.plot(x, y) plt.yscale('linear') plt.title('linear') plt.grid(True) # log plt.subplot(222) plt.plot(x, y) plt.yscale('log') plt.title('log') plt.grid(True) # symmetric log plt.subplot(223) plt.plot(x, y - y.mean()) plt.yscale('symlog', linthresh=0.01) plt.title('symlog') plt.grid(True) # logit plt.subplot(224) plt.plot(x, y) plt.yscale('logit') plt.title('logit') plt.grid(True) # Adjust the subplot layout, because the logit one may take more space # than usual, due to y-tick labels like "1 - 10^{-3}" plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.25, wspace=0.35) plt.show()
The code above produces the following. Notice that the logit has some fairly small values, which pyplot will automatically put in scientific notation.
Chart Legend
Creating a legend for the data on your chart might be important. With pyplot, we can add a legend using legend(). This can take several parameters, and it has an extensive amount of customization.
import matplotlib.pyplot as plt import numpy as np def main(): z = np.random.randn(10) red_dot, = plt.plot(z, "ro", markersize=15) # Put a white cross over some of the data. white_cross, = plt.plot(z[:5], "w+", markeredgewidth=3, markersize=15) plt.legend([red_dot, (red_dot, white_cross)], ["Attr A", "Attr A+B"], loc="upper right") plt.show() if __name__ == "__main__": main()
The code above will plot, and then the plt.legend is actually what creates the legend. In the case above, the first list we specify for legend is how the graphics on the legend will work. In this case, the first item on the legend will be a red dot, whereas the second will be a red dot with a white cross on top of it. The second list contains the text we want for each item in the legend. So, the red dot gets “Attr A” whereas the red dot with the white cross gets “Attr A+B”. Finally, loc stands for “location”, and it allows us to specify where we want to put the legend. The code above produces the following chart.
Conclusion
Matplotlib has a TON of features, which would take several lectures to cover it all. However, their documentation is quite good, so you can go through their tutorial or reference documents to find what you’re looking for.