Check if PDF files are corrupted using command line on Linux

You can try doing it with pdfinfo (here on Fedora in the poppler-utils package). pdfinfo gets information about the PDF file from its dictionary, so if it finds it the file should be ok

for f in *.pdf; do
  if ! pdfinfo "$f" &> /dev/null; then
    echo "$f" is broken
  fi
done

find . -iname '*.pdf' | while read -r f
  do
    if pdftotext "$f" - &> /dev/null; then 
        echo "$f" was ok;   
    else
        mv "$f" "$f.broken";
        echo "$f" is broken;   
    fi; 
done

My tool of choice for checking PDFs is qpdf. qpdf has a --check argument that does well to find problems in PDFs.

Check a single PDF with qpdf:

qpdf --check test_file.pdf

Check all PDFs in a directory with qpdf:

find ./directory_to_scan/ -type f -iname '*.pdf' \( -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; -o -exec echo "{}": FAILED \; \)

Command Explanation:

  • find ./directory_to_scan/ -type f -iname '*.pdf' Find all files with '.pdf' extension

  • -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; Execute qpdf for each file found and pipe all output to /dev/null. Also print filename followed by ': OK' if return status of qpdf is 0 (i.e. no errors)

  • -o -exec echo "{}": FAILED \; \) This gets executed if errors are found: Print filename followed by ": FAILED"


Where to get qpdf:

qpdf has both Linux and Windows binaries available at: https://github.com/qpdf/qpdf/releases. You could also use your package manager of choice to get it. For example on Ubuntu you can install qpdf using apt with the command:

apt install qpdf

Tags:

Linux

Pdf