bash command to convert html page to a text file

Easiest way is to use something like this which the dump (in short is the text version of viewable HTML).

Remote file:

lynx --dump www.google.com > file.txt
links -dump www.google.com

Local file:

lynx --dump ./1.html > file.txt
links -dump ./1.htm

With charset conversion to utf8 (see):

lynx -dump -display_charset UTF-8 ./1.htm
links -dump -codepage UTF-8 ./1.htm

You have html2text.py on command line.

Usage: html2text.py [(filename|url) [encoding]]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --ignore-links        don't include any formatting for links
  --ignore-images       don't include any formatting for images
  -g, --google-doc      convert an html-exported Google Document
  -d, --dash-unordered-list
                        use a dash rather than a star for unordered list items
  -b BODY_WIDTH, --body-width=BODY_WIDTH
                        number of characters per output line, 0 for no wrap
  -i LIST_INDENT, --google-list-indent=LIST_INDENT
                        number of pixels Google indents nested lists
  -s, --hide-strikethrough
                        hide strike-through text. only relevent when -g is
                        specified as well

Tags:

Bash