How is a mathematical formula represented in PDF?

Essentially in pdf every letter (or run of letters) is positioned by coordinates so even a normal word might be encoded as individual letters positioned to "look" like text, so as to take account of inter-letter kerns etc.

Math is no different: the characters are just normal font characters positioned on the page at locations that TeX has determined.

PostScript uses the same rendering model as PDF but is a bit easier to read by eye, Taking Henri's example and using latex and dvips

\documentclass{article}

\pagestyle{empty}

\begin{document}
$\int_0^2 x^2 dx$
\end{document}

Produces the following PostScript

%%Page: 1 1
TeXDict begin 1 0 bop 639 457 a Fc(R)695 477 y Fb(2)678
553 y(0)746 524 y Fa(x)793 494 y Fb(2)830 524 y Fa(dx)p
eop end
%%Trailer

where you can see the structure: strings are encoded as for example (dx) for dx and but apart from that 2 letter example all other character runs are single characters with the font and coordinates specified separately for each letter.


For my opinion (I hope to have understand well) you could use a special tool called MaxTract. It can be found at the link http://www.cs.bham.ac.uk/research/groupings/reasoning/sdag/maxtract.php.

Maxtract is a tool for converting PDF into formats such as LaTeX, MathML and text. enter image description here

ADDENDUM: Could this program PDF to LaTeX converter also be useful?

enter image description here


If you use a Unicode math font, then all of the glyphs are just Unicode symbols in the resulting PDF.

\documentclass{article}
\pagestyle{empty}
\usepackage{unicode-math}
\begin{document}
$\int_0^2 x^2 dx$
\end{document}
$ pdftotext test.pdf -
2

∫ 2 
0