Python - What is the process to create pdf reports with charts from a DB?

A slightly heretical answer: RMarkdown (in RStudio), with Python code chunks, via reticulate (the default way now) that gives you a long-lived Python "session" just like in a Jypiter notebook. The RMarkdown document can then be "knit" into PDF, html, Word, html-slides, and even PowerPoint.

Seriously, the R world is streets ahead in this area.

There are a lot of options for creating a pdf in python. Some of these options are ReportLab, pydf2, pdfdocument and FPDF.

The FPDF library is fairly stragihtforward to use and is what I've used in this example. FPDF Documentation can be found here.

It's perhaps also good to think about what python modules you might want to use to create graphs and tables. In my example, I use matplotlib (link to docs) and I also use Pandas to create a dataframe using pandas.dataframe().

I've posted a rather lengthy but fully reproducible example below, using pandas, matplotlib and fpdf. The data are a subset of what the OP provided in the question. I loop through the dataframe in my example to create the table, but there are alternative and perhaps more efficient ways to do this.

import pandas as pd
import matplotlib
from pylab import title, figure, xlabel, ylabel, xticks, bar, legend, axis, savefig
from fpdf import FPDF

df = pd.DataFrame()
df['Question'] = ["Q1", "Q2", "Q3", "Q4"]
df['Charles'] = [3, 4, 5, 3]
df['Mike'] = [3, 3, 4, 4]

title("Professor Criss's Ratings by Users")
xlabel('Question Number')

c = [2.0, 4.0, 6.0, 8.0]
m = [x - 0.5 for x in c]

xticks(c, df['Question'])

bar(m, df['Mike'], width=0.5, color="#91eb87", label="Mike")
bar(c, df['Charles'], width=0.5, color="#eb879c", label="Charles")

axis([0, 10, 0, 8])

pdf = FPDF()
pdf.set_xy(0, 0)
pdf.set_font('arial', 'B', 12)
pdf.cell(75, 10, "A Tabular and Graphical Report of Professor Criss's Ratings by Users Charles and Mike", 0, 2, 'C')
pdf.cell(90, 10, " ", 0, 2, 'C')
pdf.cell(50, 10, 'Question', 1, 0, 'C')
pdf.cell(40, 10, 'Charles', 1, 0, 'C')
pdf.cell(40, 10, 'Mike', 1, 2, 'C')
pdf.set_font('arial', '', 12)
for i in range(0, len(df)):
    pdf.cell(50, 10, '%s' % (df['Question'].iloc[i]), 1, 0, 'C')
    pdf.cell(40, 10, '%s' % (str(df.Mike.iloc[i])), 1, 0, 'C')
    pdf.cell(40, 10, '%s' % (str(df.Charles.iloc[i])), 1, 2, 'C')
pdf.cell(90, 10, " ", 0, 2, 'C')
pdf.image('barchart.png', x = None, y = None, w = 0, h = 0, type = '', link = '')
pdf.output('test.pdf', 'F')

Expected test.pdf:

Expected test.pdf

Update (April 2020): I made an edit to the original answer in April 2020 to replace use of pandas.DataFrame.ix() since this is deprecated. In my example I was able to replace it's use with pandas.DataFrame.iloc and the output is the same as before.

I agree with @drz about RMarkdown for creating such a report. An academic work should clearly use this. Anyway, there is also stitch, which is really simple to use, and may be sufficient in many cases. Many advantages from fpf :

  • pagination is managed
  • markup syntax is available
  • matplotlib and pandas graph directly outputs
  • can generate html or pdf

Here is @patrickjlong1 example in stitch :

# Stich is simple and great

## Usefull markup language

You can use markdown syntax, such as **bold**, _italic_, ~~Strikethrough~~

## display dataframes

Direct output from python will be nicelly output.

```{python, echo=False}

import pandas as pd

df = pd.DataFrame()
df['Question'] = ["Q1", "Q2", "Q3", "Q4"]
df['Charles'] = [3, 4, 5, 3]
df['Mike'] = [3, 3, 4, 4]
df = df.set_index('Question')


## display graphics

Direct matplotlib output, without rendering to file.

```{python, echo=False}
#%matplotlib inline"Professor Criss's Ratings by Users")

## Symbolic expressions

You may also want to work with sympy :

```{python, echo=False}

import sympy


Once installed, the PDF is created with :

stitch test2.stich -o output.pdf

The output will look like :

stich output in PDF