Is there a program similar to detex for windows

If you look the source code of detex (written in C) you will see that it consists the main job is done by lex (lexical analyzers) with a help of the small sed script. I checked and detex is unfortunatelly not ported to Cygwin but my feeling is that you should be able to compile on Cygwin (you have flex (free lex), gcc, gnu sed etc).

Now other options which are not so sophisticated is to write your own sed (perl) script. Obviously you need to run that in Cygwin. I am at work right now and I am sure I have seen sed one-liners which can do decent job of detex-ing. I will try to find/write such a script and post here. I will also try to post 100 points bounty for such sed one-liner. If you Google you should be able to find Perl script which does that.

Edit: Try this script which uses dvi as an intermediate format and catdvi toll to strip LaTeX tags.

$ latex file.tex
$ catdvi -e 1 -U file.dvi | sed -re "s/\[U\+2022\]/*/g" \
  | sed -re "s/([^^[:space:]])\s+/\1 /g" > file.txt

I also checked for people who wants to go dvi route dvi2tty does a marvelous job converting dvi file into plain text files. No additional processing is needed.

There is another one well known sed script tex2xml for converting tex2xml written by Tilmann Bitterberg. I will try to fix it to do conversion to plain ASCII.

#! /bin/sed -f

# Try of a nested tag{value} parser:
# - handles multiline tags
# - can deal with quoted \{ and \}
# - handles nested tags
# Limitations:
# - tags are not allowed to have [{}<>| ] in the name.
# - doesn't detect unbalanced brackets
#
# b{foo} -> <b>foo</b>
# b{foo em{bar}} -> <b>foo <em>bar</em></b>

# Tue Nov 27 17:28:32 UTC 2001

# \{1{2{3{4{5{6{7{8{9{a{b{c{d{e{f{g{h{i{\{text0\}}}}}}}}}}}}}}}}}}}text1\}

# How it works
# We build a stack of unclosed tags in holdspace
# by appending always at the end (``H'').
# when a closing bracket is found, fetch tag
# from holdspace.
# Main focus is small memory usage

# escape Quoted and generate entities
s,&,&amp;,g
s,<,&lt;,g
s,>,&gt;,g
s,\\{,&obrace;,g
s,\\},&cbrace;,g

# uninteresting line, jump to end
/[{}]/!b unescape

:open  

/{/{   
  s,\( *\)\([^|<>}{ ]*\){,\1\
\2\
,;           # Isolate tag
  # Patternspace: text \n newtag \n text
  H;         # append to holdspace
  s,\n\([^\n]*\)\n,<\1>,; # generate XML tag

  # Holdspace: ..\tagN \n text \n newtag \n text
  # We only want oldtags + newtag
  x
  s,\(.*\n\)[^\n]*\n\([^\n]*\)\n[^\n]*$,\1\2,
  x

  /^[^{]*}/b close
  /{/b open
}

:close

/}/{
  s,},\
\
\
,
  # text1 \n\n\n text2 \n\n tag0 \n tag1 text2 may be empty
  G;
  s,\n\n\n\([^\n]*\)\n.*\n\([^\n]*\)$,</\2>\1,
  x
  s,\n[^\n]*$,,;   # delete tag from holdspace
  x

  /^[^}]*{/b open;   # if next bracket is an open one
  /}/b close;        # another one?
}

:unescape
s,&obrace;,{,g
s,&cbrace;,},g

LuaTeX users may want to have a look at the spelling package. It writes out a pure text file after the LaTeX run that can be checked by your favourite spell-checker.


Have a look at opendetex. There is also a note about someone who runs Windows and detex.