Compress(minimize) HTML from python

You can use htmlmin to minify your html:

import htmlmin

html = """
<!DOCTYPE html>
<html lang="en">
<head>
  <title>Bootstrap Case</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</head>
<body> 
<div class="container">
  <h2>Well</h2>
  <div class="well">Basic Well</div>
</div>
</body>
</html>
"""

minified = htmlmin.minify(html.decode("utf-8"), remove_empty_space=True)
print(minified)

I suppose that in GAE there is no really need for minify your html as GAE already gzip it Caching & GZip on GAE (Community Wiki)

I did not test but minified version of html will probably win only 1% of size as it only remove space once both version are compressed.

If you want to save storage, for example by memcached it, you have more interest to gzip it (even at low level of compression) than removing space as in python it will be probably smaller and faster as processed in C instead of pure python


htmlmin and html_slimmer are some simple html minifying tools for python. I have millions of html pages stored in my database and running htmlmin, I am able to reduce the page size between 5 and 50%. Neither of them do an optimal job at complete html minification (i.e. the font color #00000 can be reduced to #000), but it's a good start. I have a try/except block that runs htmlmin and then if that fails, html_slimmer because htmlmin seems to provide better compression, but it does not support non ascii characters.

Example Code:

import htmlmin
from slimmer import html_slimmer # or xhtml_slimmer, css_slimmer
try:
    html=htmlmin.minify(html, remove_comments=True, remove_empty_space=True)
except:
    html=html_slimmer( html.strip().replace('\n',' ').replace('\t',' ').replace('\r',' ')  )

Good Luck!