extract text from tex, remove latex tags

Maybe not 100% what the OP requested, but maybe it is of some help.

There is pdftotext in poppler-utils. This can convert a PDF file to a TXT file via

pdftotext yourPDF.pdf

Of course this incurs the overhead of installing this package, but I think it's neglible, since it is the standard library to render PDF on Linux if I remember correctly, so if you have a PDF viewer installed (Think Evince or Okular), it will be installed already.

Find here some more instructions.


opendetex is available both for windows and Linux

download the program opendetex from here
http://opendetex.googlecode.com/files/opendetex-2.8.1.tar.bz2
http://code.google.com/p/opendetex/downloads/list

Usage: http://code.google.com/p/opendetex/wiki/Usage

extract it to any directory of your choice. Say u extract it to Downloads directory.

make another directory of any name in that (optional. but its good if u create). say the directory name is “my_paper”. Put your paper in the “my_paper” directory. say your paper name is project.tex

Navigate through the path

cd ~/Downloads/opendetex

Run the command

detex -n my_paper/project.tex  > out.txt

generic form

detex -n full_path_to_tex_file.tex > output_text_file.txt

detex(1):

Please see the OpenDetex GitHub page for the latest version of OpenDetex. It is a more modern, derivative version of my original DeTeX.

My legacy DeTeX home page is available here.

If you just want the legacy detex-2.8.tar source, you can get it here.

Tags:

Latex