`sed` command to remove highlights from text

Not sed but perl. We need recursive regular expressions to do that:

$ echo 'Hello \hl{my math $\frac{1}{2}$} world' | perl -e '
undef $/;
$_ = <>;
s/ \\hl \s* ({((?: \\. | [^{}] | (?-2) )*)}) /$2/gsx;
print;'

Line 4 means:

s/                  # replace
\\hl                  # any \hl control sequence
\s*                   # and some or no whitespace
(                     # and a TeX group (capture group #1)
  {                   # which consists in an opening brace 
    (                 # enclosing (capture group #2)
      (?: \\.           # any escaped characters
      | [^{}]           # or anything but braces
      | (?-2)           # or embedded TeX groups (recursion to #1)
      )*               # zero or more times
    )
  }                   # and a closing brace
)
/$2/gsx             # with group #2 globally

This approach assumes that your code parses correctly, and that braces in comments are either escaped or balanced.


One way would be to use vim. I understand that this approach probably isn't very accessible to a non-vim-user, but it works. return, left and so on stand for the corresponding keys.

  1. open vim: vim myfile.tex
  2. search for the pattern to be replaced: /\\hl{return
  3. now the interesting part: define a macro that will perform the replacement: qan4rightvleft%leftdv4leftpq
  4. execute the new macro once: @a
  5. now, you can execute the macro again by using @@ (just hold @) or execute it many times using something like 999@
  6. when you're done, exit using :wqreturn

I see it as an added bonus that you can inspect every instance of the replacement.

Of course, you could also \renewcommand*{\hl}{}, but that's not quite the same.


This works for me:

C:\Users\Name>echo Hello \hl{my math $\frac{1}{2}$} world | sed "s-\\\hl{\(.*\)}-\\1-"
Hello my math $\frac{1}{2}$ world

This is some version of sed on Windows, more specifically:

C:\Users\Name>sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.

GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <[email protected]>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.

But in general, this will not do: It just catches the final bracket of the expression, so multiple \hl{...} (or even other commands after that) might break it. So your example expression, for which my code works, does not represent all the use cases you may want to use it for.

This reminds me a lot of this question. What you want to do is find a matching curly bracket for \hl{; but even assuming that your code parses correctly, meaning that you never have an extra opening or closing bracket anywhere, inside or outside of \hl{...}, regular expressions seem to be incapable of achieving this without recursion, which I am not sure sed supports.

Tags:

Scripts