Can't copy-paste from my PDF. Any idea why?

It depends on the fonts that you are using:

pdffonts test.pdf

Type 3 fonts

Try to avoid them, because they are bitmapped fonts that do not scale well. Also the characters do not have glyph names that make it easier for text extraction tools.

If you are using \usepackage[T1]{fontenc} with standard fonts, then you get the EC fonts. Install cm-super to get the Type 1 versions. Alternatively use the successor fonts Latin Modern (\usepackage{lmodern}).

Package cmap

It is based on LaTeX's font encodings and adds map entries from slot positions to Unicode slots.

The package should be loaded at the very beginning:

\RequirePackage{cmap}
\documentclass{...}

The package does not depend on correct glyph names in the font or the font type. On the other side, undefined encodings (for symbol fonts, …) are not well supported.

\pdfgentounicode

This primitive of pdfTeX adds the Unicode mapping based on font glyph names. It does not work for Type 3 fonts. The support is better if the fonts contain the correct glyph names and a mapping is provided.

Usage:

\pdfgentounicode=1 %
\input glyphtounicode.tex %

The file glyphtounicode.tex contains predefined mappings for many glyhp names to Unicode.

Package accsupp

The package uses the /ActualText feature of PDF that allows to say what text should be used for the displayed glyphs. This allows the support and finer control of symbols, for example.

PDF viewer/text extractor

Also it depends on the PDF viewer or tool that extracts the text, what features they support to detect the characters. Some might only work with glyph names and slot positions, others support the Unicode mappings (they should) and more advanced support the /ActualText feature.

Addition: Package cmap and the method \pdfgentounicode should not be used together, because they add the same data structures to the fonts in the PDF file. If these are the same exactly it would not be too much of a problem, but there might be differences and this violates the PDF structure causing unpredictable behavior of the PDF reader applications that are free to choose, which value they use for the same key.

Tags:

Fonts

Pdf

Pdftex