Pypdf2 Tutorial – Complete Guide

Let’s delve into the fascinating world of “pypdf2”, a Python library that allows for a multitude of operations on PDF files. As we navigate the power of this tool, we hope to make the learning process intuitive, engaging and unquestionably practical.

What is PyPDF2?

PyPDF2 is a Python library built as a PDF toolkit. It is capable of extracting document information, splitting documents, merging documents, and more, accomplishing all this with a few simple scripting commands.

PyPDF2 finds its utility in several applications, including:

  • Extracting information: The use of PyPDF2 enables extraction of metadata from PDF files, which includes the file’s author, subject and number of pages.
  • Merging PDFs: If you aim to combine multiple PDF files into one, PyPDF2 is your go-to library.
  • Splitting PDFs: Conversely, PyPDF2 can also be used to dissect a single PDF file into separate ones.

Why Learn PyPDF2?

The gravitation towards PDFs in the digital realm of document sharing is hard to overlook. Thus, adding PyPDF2 to your python skills repertoire not only expands your toolbox but also opens up opportunities for optimizing everyday tasks involving PDFs. Whether you’re just starting out in Python or looking to enhance your existing Python toolkit, PyPDF2 provides a myriad of capabilities that can be crucial in a variety of projects.

CTA Small Image
FREE COURSES AT ZENVA
LEARN GAME DEVELOPMENT, PYTHON AND MORE
ACCESS FOR FREE
AVAILABLE FOR A LIMITED TIME ONLY

Getting Started with PyPDF2

To start using PyPDF2, you first need to install it. You can use pip, a popular Python package installer. Here’s how to install PyPDF2:

pip install pypdf2

Reading a PDF document with PyPDF2

To read a PDF file using PyPDF2, you need to first import the necessary module and open the file in read-binary mode.

import PyPDF2

pdfFileObj = open('example.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

Getting Information from a PDF File

With the document now open, you can extract metadata using PyPDF2 functions. For instance, let’s say you want to print the number of pages in the PDF:

num_pages = pdfReader.numPages
print(num_pages)

You can also get the document information like the author:

doc_info = pdfReader.getDocumentInfo()
print(doc_info.author)

Extracting Text from a PDF Page

To extract text from a specific page of the PDF, you can use the getPage() and extractText() methods.

pageObj = pdfReader.getPage(0)
print(pageObj.extractText())

This example extracts the text from the first page of the PDF file. Remember that pages in PyPDF2 are zero-indexed, so the first page is page 0, the second page is 1, and so on.

Closing the PDF File

Once we are finished with a PDF document, it’s important to close it to free resources.

pdfFileObj.close()

Let’s keep the momentum going and delve into some advanced features of the PyPDF2 library!

Splitting a PDF Document

PyPDF2 provides a simple way to split a PDF document into multiple documents. This can be especially useful when you only need specific pages from a large PDF:

for page_number in range(pdfReader.numPages):
    pdfWriter = PyPDF2.PdfFileWriter()
    pageObj = pdfReader.getPage(page_number)
    pdfWriter.addPage(pageObj)
    
    with open(f'split_page_{page_number + 1}.pdf', 'wb') as pdfOutputFile:
        pdfWriter.write(pdfOutputFile)

In this example, each page is written to a separate PDF file named ‘split_page_X.pdf’, where X is the page number.

Merging PDF Documents

Merging multiple PDF documents into one is as easy as reading them with PyPDF2:

pdfMerger = PyPDF2.PdfFileMerger()

for pdf in ['file1.pdf', 'file2.pdf']:
    pdfMerger.append(pdf)
pdfMerger.write('merged.pdf')
pdfMerger.close()

The code example above takes a list of PDF file names, appends them to the merger object, and then writes the result into a new ‘merged.pdf’ document.

Rotating Pages

Another useful feature of PyPDF2 is the ability to rotate pages. Here’s how you can rotate a page 90 degrees clockwise:

pdfWriter = PyPDF2.PdfFileWriter()
pageObj = pdfReader.getPage(0)
pageObj.rotateClockwise(90)

pdfWriter.addPage(pageObj)
with open('rotated.pdf', 'wb') as pdfOutputFile:
    pdfWriter.write(pdfOutputFile)

The rotation is performed by the rotateClockwise() method, which takes the rotation degrees as an argument.

Adding a Watermark to a Page

You can also watermark a page using the mergePage() method. This function overlays the content of one page over another:

watermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf','rb'))
watermarkPage = watermarkReader.getPage(0)

pageObj.mergePage(watermarkPage)

pdfWriter.addPage(pageObj)
with open('watermarked.pdf', 'wb') as pdfOutputFile:
    pdfWriter.write(pdfOutputFile)

In this example, the watermark – which is a semi-transparent image or text saved as a PDF – is applied to the first page of the original document. The result is saved to a new PDF document called ‘watermarked.pdf’.

As you can see, the possibilities with PyPDF2 are endless, and we have merely scratched the surface!

Where to Go Next with Your Python Journey?

Now that you have a good understanding of PyPDF2 and its potential, the question that begs to be asked is – “What’s next?”

One of our highly recommended programs is our Python Mini-Degree. This comprehensive collection of courses covers all facets of Python programming – from the basics to advanced concepts like algorithms and object-oriented programming. You’ll even venture into realms of game and app development!

What makes the Python Mini-Degree so unique and effective?

  • It’s hands-on: You will learn Python by creating games, algorithms, and real-world apps.
  • It’s flexible: Our learning options can accommodate everyone from beginners to experienced programmers.
  • It’s supportive: Access to mentors boosts your learning and eliminates stumbling blocks.
  • It’s relevant: With regular updates to keep up with industry trends, our content ensures you stay competitive.

Many students have launched successful careers or started their own businesses after completing our courses.

Looking for more than just a Mini-Degree? We have a broad collection of Python courses dedicated to providing the deepest understanding on a wide range of topics.

Conclusion

The power of Python libraries like PyPDF2 truly exemplifies how versatile and practical this programming language can be. Whether you’re a curious beginner or a seasoned coder, Python fosters a landscape brimming with endless learning paths and exciting opportunities.

Here at Zenva Academy, we’re more than eager to accompany you on this captivating journey. From comprehensive courses like our Python Mini-Degree to an enthralling array of individual Python courses, your next Python adventure is only a click away. Join us today and redefine your coding potential.

Did you come across any errors in this tutorial? Please let us know by completing this form and we’ll look into it!

FREE COURSES
Python Blog Image

FINAL DAYS: Unlock coding courses in Unity, Godot, Unreal, Python and more.