How to convert html entities to readable text?

With Free recode (formerly known as GNU recode):

recode html < file

If you don't have recode or HTML::Entities and only need to decode &#x<hex>; entities, you could do it by hand with:

perl -Mopen=locale -pe 's/&#x([\da-f]+);/chr hex $1/gie'

From How can I decode HTML entities? on StackOverflow, you may be able to implement a simple perl solution such as

perl -Mopen=locale -MHTML::Entities -pe '$_ = decode_entities($_)' email.txt

e.g. using your example text

$ perl -Mopen=locale -MHTML::Entities -pe '$_ = decode_entities($_)' email.txt
chciałabym zapytać, czy rozważa Pan takze udział w nowych projektach w Warszawie ? Obecnie poszukujemy specjalisty javascript/architekta z bardzo dobrą znajomością Angular.js do projektu, który dotyczy systemu, służącego do monitorowania i zarządzania flotą pojazdów. Zespół, do którego poszukujemy

With -Mopen=locale, I/O is done in the locale's character set. That includes input from email.txt. It looks like email.txt contains only ASCII characters (the whole point of encoding those characters using the &#x<hex>; notation I suppose), but if not you may need to adapt the above to also decode that file using the right charset (if it's not the same as the locale's one) instead of using open=locale.


A python 3.2+ version, can be used in a pipe:

python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]' < file