Can LaTeX be persuaded to produce text output?

The underlying solution is of course the same for ConTeXt and LaTeX: you need to have a way of changing what macros do such that they write the correct output rather than typesetting. This is also much the same as tex4ht does. The advantage ConTeXt has is that the macros are provided mainly by one focussed group, and they include the necessary 'back end' to make that conversion easy. To do the same for LaTeX, you need to handle all of the macros that might be present, which is a problem given the number and variety of LaTeX packages. So while in principal it's possible, the implementation is a severe challenge.

(With my 'LaTeX3 hat' on, this is an obvious area to bear in mind when defining an updated format. To do that, you need to have a much more 'regular' syntax and input than is often the case with LaTeX files at present. Again, I think ConTeXt shows how this can be done as it is already good on keeping the input within it's own structures.)


It is possible to achieve what you want, provided you do not want TeX to act as a parser. In my opinion, part of the success of TeX, is that it has managed to transform itself over the years to act as a language transformation tool. First it was TeX->Postscript and now it is TeX->pdf. Tralics has been fairly successful to produce TeX->XML.

But, I think one needs to look at the problem from a different angle. With todays available technologies one, needs to have a "Universal Mark-up Language". Markdown and Yaml are scaled down tools and can never be able to be full document description languages, so going that route will limit one's efforts.

Sometime back, I designed a CMS based on text files. All mark-up was in plain text and fragments from Wikipedia's markup language. I would load the text file via php and then filter the input and produce the HTML page.

<!--
{{feature-image: http://localhost/images/sample102.jpg }}
{{feature: A collection is like a puzzle...}}
-->

The feature-image was a div and the feature-text the caption. I had commands for image-credits and the like.

Now this is not so difficult to produce with TeX. So my proposal is to actually use TeX to write an intermediate mark-up in a text file then parse with your language of choice to achieve what you wish.

Workflow depending on targets can be one of the following:

   TeX->Intermediate MarkUp->HTML
   TeX->pdf
   TeX->plain text
   Intermediate MarkUp->Translator (javascript, perl, python, 
                        ruby, php, your language) ->TeX

In a nutshell, retain TeX and output into a new mark-up language. Markdown and other technologies can be a subset of this.

\documentclass{article}
\usepackage[demo]{graphicx}
\usepackage{verbdef}
\begin{document}
\makeatletter
%% create file and open it to write
\newwrite\file
\immediate\openout\file=wikimark.wiki
\newif\if@wikimark
\newif\if@html
\@wikimarktrue

\def\image#1#2{%
  \if@wikimark
   \image@@{#1}{#2}
 \else
   \includegraphics{dummy.png}
 \fi
}

\def\Section#1{%
  \if@wikimark
   \section@@{#1}\relax
  \else
   \section{#1}
  \fi
}


\def\image@@#1#2{%
  \immediate\write\file{\string{\string{img:#1\string}\string}}
  \immediate\write\file{\string{\string{img-caption:#2\string}\string}}
}

\edef\hash@@{\string#\string#}

\def\section@@#1{%
  \immediate\write\file{\hash@@ #1}
} 

\makeatother

\Section{Test Section}

\image{http://tex.stackexchange.com/questions/15440/parsing-files-through-lua-tex}{This is the caption}

\closeout\file
\end{document}

The minimal is just a proof of concept. Main idea here is not to redefine the LaTeX commands but rather add new ones with switches for other mark-up.


In the interests of completeness, I feel I should record my current solution (my gut instinct is that this is the best method, but the exact implementation could probably do with improvement). That is to take a leaf out of the ConTeXt book and use LuaTeX. LuaTeX provides me with some hooks to get at the processed output of TeX just before it is packed into boxes and shipped out.

Specifically, I used the hook pre_linebreak_filter to dig out the contents of each line. It's actually not a long way from the idea of StrondBad's answer, just without all the unnecessary stuff and with a bit more control over things like groupings.

My implementation can be found at my github repository for my project. I don't think it can be just cut-and-pasted into something else as it is integrated with some other parts of my project so someone wanting to use the idea would need to untangle it a bit. The crucial files are the Lua file textoutput.lua, mainly the function list_elements, and the TeX file internettext.code.tex, specifically the "true" branch of the conditional \@ifundefined{directlua} on line 53 (at time of writing).

Also, as I said at the outset, although I think this is the right strategy, it is probably not the best implementation.