Howto search in PDFs using regular expressions?

several options:

  • Agent Ransack (top answer in Best way to *confidently* search files and contents in Windows without using an indexing service? )
  • DnGrep which is a Free and Open source software. Unfortunately it is at the moment only available on Windows. (a feature request has been opened for other platforms here)

  1. Agent Ransack is free (lite) and supports PDF as its release notes confirm.
  2. PowerGREP is a commercial product.

Just as you said, the evident alternative is to convert PDFs to text. One way for a programmer to set that up for bulk processing is by using the Python package PDFMiner. Agent Ransack uses "pdftotext" from the Xpdf project (and you can too).

Tags:

Pdf

Regex

Search