PDF File Generation using Python
PDF generation is today an easy task, since most of the editors do offer this option. You can directly convert a Word document or Photoshop Data or another drawing/reporting platform into PDF. As this document type takes a huge place in our daily life, I wanted to show how very different PDF documents content can be created via Python in this article. It was a nice journey for me to learn how to that, and I hope you can also use part of the code or the approach of this article or event inspire yourself to create better PDF files.
Focus: PDF files have can be seen as two different processes: 1)Before the creation: First compose a file from the materials, 2) After the creation: the second process is to edit a PDF file. In this article, we will only focus the first approach by using the FPDF2(FPDF) library, after introducing some existing PDF libraries utilized for generating and editing.
Available Python PDF Libraries
Libraries for PDF Generation
There are already a number of PDF generating libraries such as reportlab, fpdf/fpdf2, pdfkit and weasyprint. The first one is more matured one and targets for advanced use cases. The following two libraries, fpdf/2, inspires from the PHP version and enables easy PDF creation. The second version is more actual and offers more features. pdfkit and weasyprint are HTML based solutions. They can create PDF files from URL, HTML file or a string. As they also include CSS, much more pleasant PDF files can be generated. Likewise, reportlab mixes HTML, CSS and python to generate PDF files.
The examples for the aforementioned libraries are listed below:
These libraries are also compared for a single use case in this article by Dinesh Kumar.
Libraries for PDF Editing, Merging, Cropping, Transforming and Splitting
The second group of python libraries concentrate on splitting, merging together, cropping, and transforming the PDF file pages. Furthermore, there are also features such as custom data adding, different view options, protecting the PDF files via password, retrieving a part of the text along with the metadata, and even merging whole files together. The well-known libraries are PyPDF2 and PikePDF. PikePDF is based on an existing QPDF C++ library, while PyPDF2 is a pure-python library. PyPDF2 is not any more supported, the 4th version is its latest release, however, since 2 years there is no active development for this project. PikePDF have some solid differences in comparison to PyPDF2, and it is under active development.
How should we evaluate a PDF file while developing it?
The core concept of creating a PDF relies first on comprehending how it should be structured or how it should be seen while putting objects on it. If you have a Photoshop background, the positioning of the objects on a page and placing different items on different layers would be familiar. Even you don’t know, just see the following figure and think of the left figure every object has an x,y initial coordinates on the page, and the objects can be added on top of each other as shown in the right figure. During the development process, you should also keep this approach in mind, write your code based on it.
The whole PDF surface is only limited with your imagination, at least this what I experienced while implementing. If your intention is to use as in Photoshop, this is not recommended, since Photoshop is an elaborative tool, however, with the programming you can sometimes do much more things if you know how to implement it.
The rest of the article is composed of various PDF generation use cases. All of these are developed using FDPF/2 libraries.
FPDF2 Essential Functionalities
FPDF2 library inherits nearly all feature of the precedent FPDF library and provides extra features. Starting with your own class definition, you can either inherit FPDF class directly, which eases the development, or define it as a parameter. For example, the below code inherits FPDF, and the header and footer functions can be automatically redefined according to your needs. While generating the PDF, there is no need to extra call for these functions thanks to the inheritance.
The texts are the core parts of a PDF, their fonts or size, unicode support for the local language are of great importance. If you do not want to use the default fonts, it is possible to add as below. After its explicit inclusion, you can set the font.
# add roboto-bold font with unicodefpdf.add_font(‘roboto-bold’, ‘Roboto/Roboto-Bold.ttf’, uni=True)# set roboto-bold 15 ( font size)fpdf.set_font(‘roboto-bold’, ‘B’, 15)
Adding a text to a PDF file can only be performed via cell or multi_cell concepts. The former one defines single line text, whether the latter enables adding multiline texts. Each cell has a starting point (x,y), and width and higth parameters. Apart from that the text placement in the cell can also be adjusted. The text color, border color, cell background color, thickness of the frame should be set before adding the text, if they need to be changed.
# add Colors of frame, background and textfpdf.set_draw_color(34,139,34)
fpdf.set_fill_color(0, 0, 0)
fpdf.set_text_color(255, 255, 255) or self.set_text_color(128)# Thickness of frame (1 mm)
fpdf.set_line_width(1)# add text
fpdf.cell(30, 9, ‘text, 1, 1, ‘C’, 1)# add adjusted centered multi cell
fpdf.multi_cell(30, 6, ‘text’, 0, ‘C’, 0)
Between texts a line break could be added as well:
# Line break
The library has a default margin, this can be also adjusted as below:
# set margins
fpdf.set_margins(0, 0, 0)
X,Y coordination points can be required sometimes, these two points can be starting points of the following objects. They can be separately assigned or together:
# set x,y coordinationfpdf.set_xy(2, 5)# Position at 1.5 cm from bottomfpdf.set_y(-15)
Adding rectangles, circles, ellipse or polygons are supported as well
# draw a rectanglefpdf.rect(x: float, y: float, w: float, h: float, style = ‘’)# add ellipsefpdf.ellipse(x=20, y=30, w=40, h=40,style=’F’)# add a triangle, it supports also other polygon typescoords = ((p1, p2), (p3, p2), (p4, p5))fpdf.polygon(coords, fill=True)# add linefpdf.line(100,0,0,100)
An image can be added to the PDF through the following code:
# add imagefpdf.image(file_path_name, x = None, y = None, w = 100, h = 100, type = ‘’, link = ‘’)
After adding all these codes, the most important code is to add the page. The vertical and horizontal pages and many other page formats are supported.
# add a pagefpdf.add_page(orientation=’’)
Page break functions as in MS Word, it can also be deactivated as below.
# add page breakfpdf.set_auto_page_break(auto=False)
For more details, it is highly recommended to look at fdpf and fpdf2 web pages.
Generating PDF reports from Excel Files
Excel reporting can be used in any business and be part of any software developer. From this point of view, I used an Excel file. Reading the Excel files from the folder can be easily done using well-known python pandas library and based on this data the following PDF file is created. The distance from the top or bottom or left and right can be adjusted. Every row in the Excel table is depicted via the composition of a number of cells, and the distance between cells is also calculated. In this example, the number of the cells and their distance to each other were already known. If it is not, the problem solution should be redesigned to adjust the column sizes. The content length in each cell can be longer, which may necessitate multi_cell approach. Furthermore, you can also add header and footer to the following document, it is nearly a standard code block, if your requirement are not too varying. The header and body of the table are created in different steps. First the header and then the body part are generated. By doing that, different coloring and fonts or anything else can be separately edited.
The drawback of this approach, it is not scalable if the content is too long or the column numbers take more spaces. In case you have a static structure, which does not change too much, this library is more than enough. Generating PDF reports from Excel files can only be logical if Excel does not offer such a feature.
The used function for this example is add_tables(self, orientation, report_name) in pdf_generator.py
Placing Images into PDF Page
Adding images to PDF file can be easily implemented, after giving the position the next step is to give the file path its dimensions. Having different image sizes can lead to issues, and this is unfortunately unavoidable, especially if there are multiple pages in different size and if you plan to add with a single code block. The following attempt reflects this approach, a single code block reads all images from a file and places them directly to the PDF file. This is mostly appropriate if you have similar size of images.
During the implementation phase, the already created images are fetched from the folder, however, it is also possible to read all these things with python plot libraries and the final step can be just to bring all material on the PDF page correctly.
The related function for this example is add_image(self, orientation) in pdf_generator.py. orientation depicts whether the page should be vertical or horizontal.
Adding Two Column Texts
Reading text from files or adding it directly from a string parameter is also quite easy with python libraries. As seen below, the full half page is used to display the text read from the file. Between two sides there is a line, which separates the content. All background colors can be changed dynamically. While adding the text, the font type and its character support for your language are essential. If the available fonts do not satisfy the requirements, it is highly recommended to download the fonts from the related web pages and save them into a folder, which will be later loaded before adding the text. There is no limitation for the font types or font change number, you can actually define all characters with various fonts. add_all_fonts() function in pdf_generator.py module is an example to depict how the fonts can be loaded.
The related function for this use case is add_from_file() in pdf_generator.py
Adding Texts on Custom Designed Pages
The way of consuming FPDF2 features can vary according to your imagination. A text can be displayed, or custom designs created programmatically can be also performed. For instance, the first two following PDFs have been randomly generated backgrounds with lines. On the first one (left), a text file is additionally placed on top of it. At each generation of these PDF files all colors, their lengths and their positions change dynamically. The last one (right) is composed of texts and circles, and it is just there to broaden your vision, how FPDF2 library can be used to create custom designed PDF pages.
Instead of adding text, it is also possible to add images to the same circle if desired.
The related functions for these examples are 1. add_text_to_styled_page(), 2. add_lines() and 3. add_custom_page() in pdf_generator.py
Creating Pure Art-Oriented Pages
Apart from the previous PDF files, the intention of these PDFs is purely to create fantastic PDFs that are not easy to create in editors. For instance, in the previous section, the randomness cannot be provided by the editors. The three following images are composed of circles, squares and triangles. The number of the items on each PDF page can be adjusted, i.e. the size of the items will change as well. Since all colors are at each execution randomly generated, at every call of the related function, a new page is created. I plan to print one of the following image, frame it and place on the wall in the near time.
The related functons for these use cases are add_box(self, count, shape) for first two PDFs, and add_triangle(self, count) in pdf_generator.py. The count parameter is used to speficiy how many item (square, ellipse or triangle) should be drawn, whereas shape parameter is used for deciding whether it should be ellipse or square shape.
Creating Book Chapters
A typical use case for PDF printing is indeed to create book chapters. In a separate python module, the following page is generated from the Albert Einsteins words. The files are read, and PDF pages are generated for the content. All headers, titles and chapter title can be added through minor codes. Font size, text adjustment should be manually added. The main issue with this usage is the existing editing tools such as MS Word is much more easy, nevertheless, a custom design can play a decisive role.
The related module for generation of the book chapters is pdf_with_chapter.py.
Creating Custom Calendar
Another useful PDF generation case can be the calendars and their placement on PDF files. My intention was to implement it, however, there are already good examples, which makes my endeavor unnecessary. The left positioned yearly calendar is generated using FPDF, whereas the right positioned one is created via pyearcal library constructed on reportlab library. You can easily adjust them according to your requirements and create new calendars with your own pictures.
The intention of this article is to show how FPDF2 library can be useful for many use cases and how it is flexible for them. The use cases to be added here are only limited with your imaginations and needs. Except the calendars, all source codes are accessible through the following GitHub link.
Source Code: https://github.com/cemakpolat/python-pdf-generator