How can I automatically convert all source code files in a folder (recursively) to a single PDF with syntax highlighting?

I was intrigued by your question and got kinda carried away. This solution will generate a nice PDF file with a clickable index and color highlighted code. It will find all files in the current directory and subdirectories and create a section in the PDF file for each of them (see the notes below for how to make your find command more specific).

It requires that you have the following installed (the install instructions are for Debian-based systems but these should be available in your distribution's repositories):

  • pdflatex, color and listings

    sudo apt-get install texlive-latex-extra latex-xcolor texlive-latex-recommended
    

    This should also install a basic LaTeX system if you don't have one installed.

Once these are installed, use this script to create a LaTeX document with your source code. The trick is using the listings (part of texlive-latex-recommended) and color (installed by latex-xcolor) LaTeX packages. The \usepackage[..]{hyperref} is what makes the listings in the table of contents clickable links.

#!/usr/bin/env bash

tex_file=$(mktemp) ## Random temp file name

cat<<EOF >$tex_file   ## Print the tex file header
\documentclass{article}
\usepackage{listings}
\usepackage[usenames,dvipsnames]{color}  %% Allow color names
\lstdefinestyle{customasm}{
  belowcaptionskip=1\baselineskip,
  xleftmargin=\parindent,
  language=C++,   %% Change this to whatever you write in
  breaklines=true, %% Wrap long lines
  basicstyle=\footnotesize\ttfamily,
  commentstyle=\itshape\color{Gray},
  stringstyle=\color{Black},
  keywordstyle=\bfseries\color{OliveGreen},
  identifierstyle=\color{blue},
  xleftmargin=-8em,
}        
\usepackage[colorlinks=true,linkcolor=blue]{hyperref} 
\begin{document}
\tableofcontents

EOF

find . -type f ! -regex ".*/\..*" ! -name ".*" ! -name "*~" ! -name 'src2pdf'|
sed 's/^\..//' |                 ## Change ./foo/bar.src to foo/bar.src

while read  i; do                ## Loop through each file
    name=${i//_/\\_}             ## escape underscores
    echo "\newpage" >> $tex_file   ## start each section on a new page
    echo "\section{$i}" >> $tex_file  ## Create a section for each filename

   ## This command will include the file in the PDF
    echo "\lstinputlisting[style=customasm]{$i}" >>$tex_file
done &&
echo "\end{document}" >> $tex_file &&
pdflatex $tex_file -output-directory . && 
pdflatex $tex_file -output-directory .  ## This needs to be run twice 
                                           ## for the TOC to be generated    

Run the script in the directory that contains the source files

bash src2pdf

That will create a file called all.pdf in the current directory. I tried this with a couple of random source files I found on my system (specifically, two files from the source of vlc-2.0.0) and this is a screenshot of the first two pages of the resulting PDF:

enter image description here


A couple of comments:

  • The script will not work if your source code file names contain spaces. Since we are talking about source code, I will assume they don't.
  • I added ! -name "*~" to avoid backup files.
  • I recommend you use a more specific find command to find your files though, otherwise any random file will be included in the PDF. If your files all have specific extensions (.c and .h for example), you should replace the find in the script with something like this

    find . -name "*\.c" -o -name "\.h" | sed 's/^\..//' | 
    
  • Play around with the listings options, you can tweak this to be exactly as you want it.

(from StackOverflow)

for i in *.src; do echo "$i"; echo "---"; cat "$i"; echo ; done > result.txt

This will result a result.txt containing:

  • Filename
  • separator (---)
  • Content of .src file
  • Repeat from the top until all *.src files are done

If your source code have different extension, just change as needed. You can also edit the echo bit to add necessary information (maybe echo "filename $1" or change the separator, or add an end-of-file separator).

the link have other methods, so use whatever method you like best. I find this one to be most flexible, although it does come with a slight learning curve.

The code will run perfectly from a bash terminal (just tested on a VirtualBox Ubuntu)

If you don't care about filename and just care about content of files merged together:

cat *.src > result.txt

will work perfectly fine.

Another method suggested was:

grep "" *.src > result.txt

Which will prefix every single line with the filename, which can be good for some people, personally I find it too much information, hence why my first suggestion is the for loop above.

Credit to those in the StackOverflow forum people.

EDIT: I just realized that you are after specifically HTML or PDF as the end result, some solutions I've seen is to print the text file into PostScript and then convert postscript to PDF. Some code I've seen:

groff -Tps result.txt > res.ps

then

ps2pdf res.ps res.pdf 

(Requires you to have ghostscript)

Hope this helps.


I know I'm waaaay too late, but somebody looking for a solution might find this useful.

Based on @terdon's answer, I have created a BASH script that does the job: https://github.com/eljuanchosf/source-code-to-pdf