How to Process Video Frames using OpenCV and Python

You can access the full course here: Create a Raspberry Pi Smart Security Camera


Hello everybody, my name is Mohit Deshpande and in this video, we’re going to start building our app.

Actually, before we really get into our app, we first have to discuss something really important. And that is, how do we actually look at video in terms of images? One way to think about it is that video is really just a sequence of still images. And you can see that, but if you take any video and you pause it, you can kind of increment it just a little bit and you’re kind of going frame-by-frame.

That’s what they’re called in video, They’re called frames. It’s just a particular still image. You know, you can go through all of the frames and when you play them really fast, it appears as video because our eye can’t really detect those changes that fast. We don’t really see them as still images when we play it fast enough, we see it as one coherent video. As it turns out, when we’re dealing with OpenCV, this is exactly what OpenCV likes to think of videos as, as still images. So any of the image processing stuff that we’ve already talked about, we can apply to each frame of this video.

There are a couple different ways that we can setup video and we’re gonna get into the Python code, actually. The first thing I should do is import some of my core things here and that is cv2, and then I’m gonna need numpy as np at some point. Those two things are, you know, when you’re starting any sort CV project, these two imports are really good as the first two lines, you just start reimporting cv2 and numpy ’cause odds are, if you’re doing anything with CV, you’re gonna need these two.

Anyway, so now the question is, how do we get video from, for example, how do we open a file? There’s lots of video files that we can use. .mov or mp4, or even better, for the purposes of our security camera, we don’t want to just open up from a video. We actually wanna open up from a camera, a live camera. So as it turns out in OpenCV, this is actually really, really easy to do, to transition between an image file and the camera itself.

We’re gonna be dealing mostly with image files because some of you may not have a webcam, for example. So we’re primarily going to be dealing with image files, but I’m gonna show you how we could extend this to your webcam. Actually, if we were to run this code on the Raspberry Pi, you would use the camera that I showed you how to install.

First things first, we actually have to tell OpenCV whether we wanna use a file or the actual live camera stream. There’s something in OpenCV that we can use. I’m just gonna call this cap for capture. We’re gonna say cv2.VideoCapture and video.mp4.

Actually, what I have is in the same directory. This is my, I call this in my Developer/security, this is the same Developer/security I have in video and we’re gonna run this against our security camera. We’re gonna be kinda testing it against this so that we can see if it works or not. And so this is security camera footage and if I double click on it it shows me breaking into my roommate’s room.

This is the video feed that we’re gonna be using and I’m gonna provide you this video so you can test it. It’s fairly short because I didn’t want to have to use a really long video file because it takes up a lot of space on your hard disk and we really don’t need that much of a, I tried to balance it out so that there’s at least a few seconds of just stillness here so that we can compare it against when we do the image comparison with video frames. Anyway, well we’re gonna get to that much later.

I just wanted to introduce you to, this is kind of like the data set, the test video that we’ll be using. You can see why I paused it, this is one image and this is what OpenCV is going to be interpreting this as. OpenCV, what it’s gonna do is look through each single frame of our video and we can sort of iterate through those until we reach the end of the video. So let me quit out of this.

So you’ll be provided with video and you can feel free to use your own or you can feel free to use live streams from your webcam, but I’m just gonna provide this video so that everyone’s consistent there. I’ve got some other stuff here but I’ll explain that as we move along. So anyway, this is gonna be video capture, so I just called this cap for capture and now what we want to do is, we want to actually recreate a video player, basically. What I was gonna do is I’m gonna load this video, I just wanna play it back frame by frame. So, how do we exactly do this?

Well, first thing is we need some kind of loop, some kind of structure, to make sure that our video, we’re getting valid frames from our video. So to actually display these videos what we want to do is, if we think about it, we just want to always be pulling frames from this video until we reach the end. What we’re gonna do, like I mentioned, is to just build a small app, kind of, that just replays this video file, and that’s a pretty good start. So to do this, I’m going to first start off an infinite loop. And you may be saying, whoa wait a minute is there some way to check to make sure that, how do we know if we reach the end of the video? And I say, hold on a second, I’m getting to that.

The reason that we put it in a while True loop is so that we keep checking frames from the video and if we pull a frame that we’ve already seen, or at the end of the video which which won’t pull a frame, that frame will actually be None in Python. Once we are fetching a frame that’s None then we know that we’ve reached the end of the video and that we need to stop, thus break out of this loop. So that’s basically what we’re gonna do. Actually before I get to that, one important thing that you have to remember to do is call release() on this videoCapture. That’s sort of like a clean up thing. It’s kinda like a cv2.destroyAllWindows sorta thing.

It just releases the resources that are allocated to this video or the webcam. In this while True loop, what I want to do is pull a frame from this capture. So I can do that really easily using And it’s kinda like imRead except with videos and print. And it actually returns two things. So first thing, it actually returns a tuple or a list. First thing is just a return value that we don’t really care about. But the second this is the actual frame.

So now that we have the frame, this is where we can do some sort of checking like, something like, if frame is None then we want to do something like break out of this loop. So when we enter this case, then we know that we’re done with the video and we just end. With webcam stuff you don’t necessarily need this because as long as the webcam is plugged in and running then we really shouldn’t ever encounter this case.

Except for maybe there’s some weird CV thing that might, some error with OpenCV that could potentially happen that causes this to return None. This still might be a good idea to leave in just in case. Just in case. Anyway, now that we have a frame we can treat this just like an image, any of our image processing techniques that we’ve learned about before, we can apply to this frame. It’s just a frame, it’s a single image. Anything that we know, we can just apply to this frame and that’s super awesome about OpenCV.

The one thing that we have to keep in mind though, and that video is a sequence of frames, and so the thing is, you don’t want to do any kind of image processing stuff that’s going to take a long time with a frame. And we’re gonna see, as we get towards the end of the app, that performance becomes really important.

The speed of your code becomes pretty important when you’re dealing with video. Because if your code is slow on these image processing tasks then you’re gonna notice it because the frame rate in the video is just gonna drop way down. So we have to make sure that we’re not doing too intensive computer vision operations. It can’t be too intensive when we’re dealing with frames. We just want to get in the frame, do what we can with it, and then move on to the next one, quickly.

Now that I have this frame though, I can just show it like an image. I’m gonna do cv2.imshow() and I’ll pass in uploading to the window and the frame, and I can show the frame. There’s one other thing that we have to do, because again, we’re dealing with video, we can do something like cv2.waitKey(1) and then there’s this & 0xFF. If this is equal to q, then that’s another reason for us to break. And this q just lets us quit out of this while loop with a single key press.

And it turns out that we need this sort of thing because we have to make sure that when we’re fetching things from a file or from the webcam, we have to actually give it just a small amount of time for us to actually fetch the frame and do something with it. It turns out, if you get rid of this line then it’s gonna be the case that your app’s not gonna work because it’s gonna crash along this line saying, hey you haven’t provided a frame. That’s because we have to give our camera a second to actually pull the frames here.

Now that I have this sort of thing going, let’s actually run this. And actually, one thing I forgot to do is cv2.destroyAllWindows() so that we get rid of the image or of the very last, a window’s gonna pop up that’s gonna show our video. So this is all we need to run our video so let’s actually run this and see our video playing. And I also, I guess I forgot to put this if here. That’s also important. Okay, and this is all we need to play our video.

So let’s go ahead and run this and we should see our video play. And yeah, awesome! We can see our video playing and it should close out of this in just a second. Excellent, okay! Now we have our video playing and this is actually just where I want to stop, right here because we’ve actually covered quite a bit in this video. So lemme just do a quick recap. And actually before we stop, I just want to mention one thing.

And that’s how we’re playing video right now, but if I wanted to run a webcam, how would I do that? It’s actually really simple. I’m looking to copy this line here. I’ll copy paste it. This is really awesome with OpenCV, is that to go from a video to a webcam we use this exact same line of code, except instead of this we put zero. And that’s it. And so we can replace line four with line five. This will actually get video from your webcam or the camera on the Raspberry Pi instead of a particular video file.

But like I mentioned, just to keep everything consistent, we’re gonna be using video files, but I’m gonna leave this in here and I’ve commented it out so that you can easily flip between the two if you so want. This is where I’m actually gonna stop the video because we’ve actually covered a lot.

So, just to do a quick recap. We’ve covered how to load a video from a file and how to use it from, or how to load it, stream it from a webcam. I then mentioned that videos are just a bunch of still images, frames, and they’re just played so fast that as a human we see them as being one continuous video. So now that we’re loading up this video, how do we actually pull frames from it? That’s what this does, is it’ll just pull one frame and it’ll pull the next frame or the first frame. And with the first value we just send a return value, but the second value in this tuple is what we care about and that’s the frame.

So then we can just show this frame, and we can do anything that we want to this frame that we learned about image processing, cause this frame is just an image. To illustrate this we just use cv2.imshow and this is what we use for images. So now we’re using this for frames. Of course, cause frames are images, which is why this works out. And one thing that we have to have here is cv2.waitkey() because we have to give our camera a second to actually take the frame and give it to us so that we can work with it. Or in this case, take a frame from the video and give it to us so that we can work with it. This also adds in some functionality that we can just quit out of our app any time we want.

And then one special case that we have to think about is, what if we’re at the end of the video, in particular if we’re loading video and not streaming from the webcam. What if we’re at the end of the video? So what happens is, if we’re at the end of the video this should return None because we’re trying to get the next frame after the last frame, but there is no frame after the last frame, which is why we return None. So all this makes sense.

So there’s two last things that you have to do any time you’re using anything with video capture. You have to make sure that you call cap.release(). This will release your app’s control of a webcam, or it’ll close up any resources that deal with this video and then we have our classic cv2.destroyAllWindows().

Okay, so this is what we have covered in this app and what we covered in this video. We started building our app, and if you run it this will just load the video and play it. This is really good start but in the next video we’re going to build on this concept a bit more. And we’re gonna actually get into thinking about how we can build a security camera and some of the different talking points with that. So we’re gonna get into that in the next video.

Interested in continuing? Check out the full Create a Raspberry Pi Smart Security Camera course, which is part of our Python Computer Vision Mini-Degree.