Getting Started with Data Visualization in Python

You can access our newest course on Data Visualization here: The Complete Python Data Visualization Course

Transcript 1

Hello, everybody. My name is Mohit Deshpande and thanks for joining me. And in this course we’re gonna be learning a lot about data visualization.

And so what we’re gonna be looking at is, we’ll be able to build something like this, in fact we’ll be building this exact visualization. And you can see that we can do all sorts of neat things with this kind of visualization. And it’s really useful for if you have a lot of data and you want to try to gain some insights about that data.

It’s sometimes helpful to visualize it, but the question might be, what kind of visualization, or what kind of plots, should I be using? And so hopefully we’re gonna be answering that as we kinda progress through this course. And so before we get started with the actual plotting of the stuff, we have to know a little bit about statistics. And so I’m just gonna have a very brief and just a very scratch-the-surface sort of thing, of statistics, and then we can get right into doing plotting.

And so we’re gonna start with some of the more basic plots, like you’ve probably heard or seen the bar charts, you’ve seen lots of line plots and scatter plots, and so on. And we’re gonna look at some of the more advanced plots, like there are quiver plots, which are used for vector fields. There are 3D line and surface plots, like the one that I just showed. And then we’ll also talk how we can kind of arrange these multiple plots into just one figure.

So we’ve been making courses since 2012 and we’re super excited to have you on board. Online courses are a great way to learn new skills and I take a lot of them myself. ZENVA courses consist mainly of video lessons that you can watch at your own pace and as many times as you want. We also have downloadable source code and project files that contain everything that we build in the lessons. And it’s highly recommended that you code along with me. That’s, in my experience, the best way to learn a new skill is to actually code along.

And lastly, we’ve seen that students who get the most out of these online courses are the same students who kind of make a weekly plan and stick with it depending on their own availability and learning style. And remember that you can watch and rewatch these video lessons as many times as you want. So this really gives you more flexibility. And at Zenva, we’ve taught programming and game development to over 200,000 students over 50 courses. That’s in 2012. And some of the students have used the skills that they’ve learned in these courses to advance their own careers, start a company, or publish their own games and apps. So thanks for joining, and I look forward to seeing all the cool stuff you’ll be building. Now, without further ado, let’s get started.

Transcript 2

Hello everybody my name is Mohit Deshpande. In this video, the kind of chart that I want to cover is called a bar chart.

So what we’ll be doing is, I’ll show you how we can use Mat Plot Lib, is the library that we’re going to be using, and we can pull out a bar chart. And the bar chart is just going to have, we’ll do it first with one, one series of data and then we’ll do like two series, two series of data. And so I’ll show you how we can get that. And then a ton of different stuff that we can do with, with bar charts. First of all, you’ll notice that I have a ton of imports here and just ignore them for a second, cause we’re going to get to them.

At some point we’re going to use all these, but I’ll just ignore that for a second. Probably one of the more important things is, is, it shows the actual, it’ll show the graph. But we’ll just plot, that’s this matplotlib.pyplot. And that’s what the, that’s the kind of plot.

All the plotting we’re going to do is going to be using this, using functions on this plot. So, that’s kind of how Mat Plot Lib works. To actually discuss a bar chart, so what kind of data is a bar chart good to show? Well, we can use bar charts to show something like… We can use bar charts to show categorical data. What I mean by that is, suppose that we wanted to… Suppose that we wanted to know how many people have A’s and B’s and C’s and D’s and F’s in some classroom or something. So what we can do is, we can make a bar chart so that all of the, you know all of the, on the X axis the bar chart will be what grade they have and then on the Y axis will be how many receive that grade. And then we can actually split it up.

Because maybe for the particular class, we could split it up by people who had one Professor, or people who had another Professor. Or we could split it up in many different ways, but that’s just kind of an example of what we could use for for bar charts, is when you have discreet data that you want to, discreet data that you want to plot, like counts or something like that. First thing that we’ll do is, we need a number of bins, and what I mean by bins is how many actual, how many tics on the X axis do we have. We have five right? A – B – C – D – and F.

Then we can actually create our data, and for this I’m actually going to use numpi. Numpi has some pretty great functions, so numpi.random has some great functions. So I can go randint. I pass in the minimum value, the maximum value, so let’s go zero to 100. And then I pass in, you know what kind of data that I, or what kind, how many of those points that I want to generate.

In this case I want to generate one for one per bin. And so what’s going to happen is, each A, B, C, D, and F we’re going to have a number for that and that’s going to be bar one. I’m going to create bar two a little bit later, but after I have this, I actually need these indices and the indices are what Mat Plot Lib will use to kind of organize. So we will say that the first index is going to get the value at bar one. For the second index we’ll get the value bar 1, and then you know, this is good because it works for multiple, when we have bar charts that have multiple bars or have multiple series I should say. So a quick way to do that is, np.arange, not arange but A range. And then you say number of bins, and then that will get you all the indices.

And then here comes the magic, you actually plot it. So and then pass in the indices. And then we pass I the data, which is just bar one. And there’s some other options that we can use, but I’m going to leave those blank for now. And so this is like the bare, the bare minimum that we need and I can just kind of plot this and we can see what happens.

Okay so here is my bar chart. And so it just generates some random numbers from zero to 100 and then here’s what the bar chart actually looks like. These bars are really thick though, so I’m gonna have, I’m actually going to thin them out. So I’m going to exit out of this. Then we put another parameter here called bar widths, that we’ll set to like 0.25. And we can actually make these bars a bit skinnier, and we’ll need to do that when we are… that other parameter goes here, we’ll need to do that when we go to have multiple bars in just a second here.

I can rerun this and then now my bars are skinnier. So this works out well, oh wait a minute, the values that are showing here aren’t actually A, B, C, D, or F. So we can change that, very simple. We say, whoops, we say plot.xticks and then we show the indices that we want. Indices, and then we just give it a twopull of the values that we want, and since we have five we can just say, A, B, C, D, and F. And so now that should the X ticks, or that should actually label them instead of giving them numerical values at the bottom. And there we go. So now you see that instead of these numerical values at the bottom, we actually have A, B, C, D, and F. Okay, great.

But how well do this scale for if we have multiple series. And what I mean by multiple series is that when we run this, there were just basically bars in one color. So what if we wanted to have another bar because maybe we’re like splitting it up by, maybe we want to split it up by like instructor or something like that. Maybe you take the same course with a different instructor and maybe one instructor’s students tend to to better than another instructor’s students, or something like that. And so we can do that also very easily, in fact, I just got to copy this here. And I’ll make bar two, and it’s going to be just the same as bar one except now I’m actually have, I have to put it in the in the bar chart. And that’s going to be a bit more complicated.

What, basically what we can do here is say and then instead of indices I have to do indices + plus bar width. And what that does is that like shifts, cause if we just have indices what’s going to happen is bar two is going to be like overlaid on top of bar one and we don’t want that. And we know what the width of bar one is, it’s bar one width. So now what this will do, this will put the two, put the two right next to each other in the bar chart. Then I can say bar two and then I also have to specify the bar width.

And then for any plots that you, if you want to have three bars for example, then the next one would be indices plus two times bar width. And that will be, that would shift you bar over three and so it’s in the right positions so it’s not like being overlaid by anything.

But anyway that’s basically how that works, and we can run this and see what our result is. Okay so now we have you know like two bars like that. There’s some cleanup stuff that we can do. For example, we can change the colors cause these bars are both the same color and that’s not really desirable. So we can change the colors very easily. In there’s label parameter called color, and then we can pass it a single letter or color. So like B is blue, and then for this one we’ll do color = G for green.

And also I, I also want to label these bars. And so when I create a legend, I can actually give them a label. And so I can say like, this represents the scores of Professor one. So I can label = like… Professor one or something. And then here I can also say, label = Professor two. And then I can do a legend, so plot.legend and that’ll automatically generate a legend just based off of these labels. So when I run this, I get this result. You’ll notice I have my, you know, all the bars are colored now so that they’re nice, and I have Professor One and Professor Two.

I’m going to do one quick thing before we move on and that is, I’m going to move these labels so that it’s actually kind of in between these two. And that’s also pretty simple. And I’m also going to label my Axis as well because it’s always important to label your Axis and give it titles. So I can, I can do the Axis label real quick. So plot.Xlabel labels the X axis, so this will be like final grade, and then the Y label function just labels the Y axis. So this will be score, or I should say, more accurately I should say score frequency. Which is how many students received that score.

And now if I want to shift over, my X tics here, I want to shift these over. How much do I want to shift it over by. Well, I’m in the same predicament as when I did my second bar. I can just shift it over by my bar width, so I’m going to say copy, paste, and now what’ll happen is this will be positioned so that they’re one bar width over. And so they’re in between the two. See so now I have my Axis that are labeled, frequency and final grade, I have my Professors legend, and I have my X ticks are like nice and in between the two.

So I’m going to stop right here. We can also add a title, but I’ll get to that later. So in this video we discussed quickly how we can make a bar chart. And I discussed some of the stuff that we can do specifically to Mat Plot Lib and that is, we call pass in the indices and the data and so then it’s going to take each potion of the indices and plot it with the, the plot corresponding to the coordinate here. And then we also did the bar width and then we can give it color and a label, and the label’s for the legend that’s showing up on the top right. We can also change the position of the label if we wanted to.

But anyway, if we wanted to have more than one bar then we’ll have to make sure that we shift the bars over by some bar width. So plus one times bar width, plus two times bar width, and so on. We can label our X and Y axis using these functions. And with X ticks, we can position the X labels. And so that we don’t have numerical values, we have categories, like A, B, C, D, or F. And so that is how we can plot bar charts in Mat Plot Lib.

Transcript 3

Hello everybody,  in this video I want to explain how we can plot a histogram in matplotlib, and I’ll also mention what a histogram is in matplotlib.

So what I’m gonna be doing from now on is I’m actually gonna be taking this section and commenting it out. So that way if you wanna know how to make a certain kind of plot you can always just uncomment the section by removing this line here and this line here, and then you’ll show the plots. Anyway, that’s sort of bar chart, and in this video what we’ll be doing is plotting histogram. So you might be asking first of all, what is a histogram?

A histogram is very similar to a bar chart, but there’s some technicalities that are being brought up. For example, for a histogram, the bars in between the bars don’t touch, or the bars do touch, basically. With a bar chart actually, a bar chart the bars they usually don’t touch, but with a histogram the bars do touch. Histograms are particularly important in the field of probability and statistics. There’s a lot of, there’s a thing called distributions, and distributions have a certain shape. And to know what the shape of a distribution is, you have to look at the histogram. And by taking random values from a distribution, you should be able to generate a histogram that looks similar to what the original, what the true value of the distribution is.

I’ve been saying this word a lot, but we can actually go ahead and plot the histogram. One cool thing that we’re gonna be doing is, I’m going to explain a little bit about what this thing called a normal distribution is. I want to show you basically what it looks like first, because if you know what it looks like, then when we draw the bar chart, it might be a little bit clearer what the final result should look like, I should say.

Before we actually get to plotting the histogram, I wanna take a second and just show what the normal distribution looks like. Actually, the normal distribution itself is very very widely used, and there’s a lot of stuff that discusses normal distribution. So what I wanted to do is to take a few seconds, and just at least visually draw it out so that you have some idea of what it looks like. So what we’re talking about is normal distribution. There are of course other kinds of distributions. Distribution.

There are of course other kinds of distributions, but probably the most famous one is the normal distribution. So when I plot this space with the histogram, it’s gonna be like, that was a really bad line. It’ll basically be kind of like this here, and so here’s my line. The way that it looks like is we have some mean. Suppose here is my mean. And it turns out that the peak of the normal distribution is at that mean, so it kind of looks like this, curling down here, and then going back to the top here, and then down like that. And so this is what kind of like the normal, and then these actually go out to infinity.

These actually, whoops, that wasn’t good. But these technically do trail outward to infinity in both directions, and so this is basically what the normal distribution looks like, it’s kind of like this curve like this.

And so here is what the mean of the normal distribution is, and the variance like what we talked about actually determines how far this is out. Here’s one with a fairly small sigma, or variance. I can have a much larger, if I have a larger variance then what happens, it will end up with something that’s gonna be more like this. And so you can see that the values are like way more, this peak here is smaller because all my values are being spread out more, as opposed to this. And I guess technically if it was to look like this then it should devolve very quickly here. And then this will be a bit more, flatter basically, is what I’m trying to say.

This is the normal distribution, so when we sample from a normal distribution, then we will, then we can expect to see something like this. I can generate a normal distribution, I just need a mu and a sigma. I can say here’s my mu, here’s my sigma, and so we’ll just do something like, well here I can actually just, mu equals zero and sigma equals one, it’s a unit normal. What we can do is generate all of these values. So we can say something like vals equals mu plus sigma, times np.random.randn. This will basically just generate 1000 points that fit this normal distribution whose mean is zero and whose sigma is one.

And we can plot this again, let’s do plt.hist, and hist is histogram. I can just pass in my values, and then another parameter I can pass in is the number of bins, basically. I can plot that, and so this is really all we need to plot our histogram.

So I can run this, it only takes a second, and you see that hey, it actually does, compare what this looks like here, to what this looks like. You see that they’re not exactly the same, but it fits the general scheme, right? And that’s basically what happens when we do this sampling is that we are taking, we’re just kind of picking data from this smooth curve here and we’re just like picking discrete data points. So when I pick these discrete data points, I get a histogram that looks like this. And the more data points that I pick and plot, the smoother this will look like, and the closer that this distribution actually gets the true normal distribution. Anyway that is a histogram.

Another thing, again, for clarity what we should do is plot the, we should do the xlabel. The xlabel on the x would just be like bins or values. Histograms usually have bins. For the y label, that is the frequency, or the frequency or you can think of it as like a probability, but we’ll say like frequency. And we can also give it a title, by the way, we can say title is Normal Distribution sampled. And what I mean by sampling is that’s basically what we’re doing here is we’re just picking 1000 values that fit the normal distribution, and then we can just plot them.

And the more points that we pick from that distribution, the closer it’ll look like. Then one extra component that I wanna mention is we can actually have a grid. So I can do plt.grid(True), and that will actually show a two-by-two grid, so it’ll become clear when I run it and I show you.

So now you can see that we actually get this nice grid here. So we can get this kind of normal distribution here, so I got my labels on and everything looks good, and you can see here’s my frequency, so you see that the majority of my, the most frequent points, are around zero. That makes sense, right, because zero is the mean, and I set the deviation to be small, but zero is the mean, and sigma is actually what’s called the standard deviation, which is the square root of the arrangement. That’s beside the point, but it’s the same principles apply. So the larger my sigma value, the more spread out this is gonna be. And what I can do is I can actually increase my sigma value and we can see how that changes. So I’ll change it from like one to 10.

When I run it you can see that it’s not as pointy. It’s still a little pointy, but it’s a bit flatter, if you notice, and let me change it again. I’m gonna change it to 100 now. And so you can see now it’s like way more spread out. It looks the same, but consider these extremes. And so we’re spreading out our normal distribution. If I were to zoom in at zero then you’d see that, it’s actually quite flat locally. Matplotlib automatically zooms out for you, is the point. So I’m just gonna change that back to one here, and we get our distribution. So notice the scale of this, right? So anyway, that’s where I’m gonna stop right here.

So in this video we discussed histograms, and they’re a great way to look at distributions of data, if you have a ton of data and you wanna know whether it fits a particular distribution, you can throw it up into a histogram, pull up a ton of figures of histograms, and then of different distributions, and see well how closely does my data fit this particular distribution. So I can compare and see if my data fits the distribution given a histogram. So anyway that is the histogram in matplotlib.

Interested in continuing? Check out The Complete Python Data Visualization Course, our latest data visualization course, which is part of our Data Science Mini-Degree.