Censor text with regex

You can do it with sed too:

sed '/^[[:blank:]]*-[[:blank:]]/{
h
s///
s/./X/g
x
s/\([[:blank:]]*-[[:blank:]]\).*/\1/
G
s/\n//
}' infile

This copies the line over the hold buffer, removes the first part [[:blank:]]*-[[:blank:]], replaces the remaining characters with an X, then exchanges pattern/hold space so now the censored string is in the hold pattern and the original line is back into the pattern space. The second part of the line is removed with s/\(...\).*//, the string in the hold space is appended to pattern space (G) and the \newline char is removed. So with a file like:

- line here
not - to be modified
  - a b c d e
 - another line-here

the output is:

- XXXXXXXXX
not - to be modified
  - XXXXXXXXX
 - XXXXXXXXXXXXXXXXX

If you want to remove blank chars and replace only the non-blank ones with X:

sed '/^[[:blank:]]*-[[:blank:]]/{
h
s///
s/[[:blank:]]//g
s/./X/g
x
s/\([[:blank:]]*-[[:blank:]]\).*/\1/
G
s/\n//
}' infile

output:

- XXXXXXXX
not - to be modified
  - XXXXX
 - XXXXXXXXXXXXXXXX

or, in one line with gnu sed:

sed -E '/^[ \t]*-[ \t]/{h;s///;s/[ \t]//g;s/./X/g;x;s/([ \t]*-[ \t]).*/\1/;G;s/\n//}' infile

Adjust the regex (i.e. ^[[:blank:]]*-[[:blank:]]) as per your needs.


A Perl solution:

perl -pe 's/^( *- )(.+)/$1."X"x length($2)/e'

This uses "X" x length($2) to get the correct number of Xs in the replacement.

Test input:

- Hello World
  - Earth
This is not - censored

output:

- XXXXXXXXXXX
  - XXXXX
This is not - censored

$ awk '/^[ ]*- /{gsub(/[^ -]/,"X",$0)}1' <<EOM
- Hello
  - World 2015
This is not - censored
EOM

- XXXXX
  - XXXXX XXXX
This is not - censored

The awk expression looks for any lines that begins with a - character, after optional whitespaces. For matching lines, the gsub() command replaces all characters except for whitespaces and the - character. The final 1 is just a shortcut for {print $0}, i.e. to re-print the entire line.

edit: Since you also require removing/replacing the whitespace characters with X too, I can't really think of a more elegant solution other than to do an additional replacement:

$ awk '/^[ ]*- /{gsub(/[^ -]/,"X",$0);gsub(/X X/,"XXX",$0)}1' <<EOM
- Hello World
  - Earth
This is not - censored
EOM

- XXXXXXXXXXX
  - XXXXX
This is not - censored