is there a minimal converter of well-defined latex to xhtml?

I know this has probably been written many times and many times better than what I can write in my two days with modest perl background here, but because I could not find a good starter that is readable and does the basic job, my attempt is at

http://ivo-welch.info/computers/iawltx2html/

I would have posted it here, except that it is too long at 400 perl lines (plus a long entities data base, plus a documentation sample). At 400 lines, the perl program should be reasonably maintainable. It could also be better refactored.

Think of it as a good starter and proof-of-concept. The perl program is not attempting to produce documents that match the latex look. instead, it is trying to convert reasonable subset-latex documents to html documents, to be styled with css and mathjax later. It informs the user of what tags have not been converted (and have been passed through for subsequent processing) and what should be styled.

Alas, I am having problems with tabular's, especially \hline (or \toprule, etc.), as well as with formatting columns. I tried

 <table style="tr td:nth-child(2) { text-align:right }">

and about 20 variations thereon including various versions of table, tr, td, but nothing seemed to work.

A minor nuisance is that Regexp::Common balanced matches include their parentheses, so I have to strip them. It also cannot handle two consecutive arguments, in which case I fall back on {.*?} for matching instead. This is really primarily an issue for commands with optional arguments. This isn't handled well, either.

Another nuisance is that <li> and <p> remain unbalanced. thus, they need to be cleaned up by tidy or pandoc if xhtml is needed. Someone who is more clever than I and who has more knowledge of html (e.g., what other tag can end a paragraph tag?) could probably fix this up fairly easily.

I would love to fix this perl program up further to converge to something useful. For me, this quasi-parser will also help me keep my latex files reasonably sane. I will know what passes and is convertble/recognizable and what is not. It's an itch I have had for a long time.

It's pretty useful already, though.

Tags:

Html

Perl