ASCII Tux (Linux Penguin)

Many script languages, 6 UTF-8 bytes

It doesn't print ASCII art, but...

""

The character in the quote is the unicode character penguin (U+1F427) (if you want to see the penguin above, download the Symbola font). It looks like this (but a lot smaller):

Penguin character shown with huge font size


Instead of attempting to print a given penguin, I hacked together a program to reduce ASCII art to a minimum production string in Python. I could do some crazy hacks with metaprogramming the environment to define variables for re-used strings, but this is good enough for now. It's Sunday, what can I say.

ASCII compiler - 882 - Python


#!/usr/bin/env python

import sys, re

d = "".join(sys.stdin.readlines()) # get all text (WARNING... BIG INPUTS POSSIBLE)
d = re.sub(r' +[\r\n]','',d)       # strip trailing spaces
i,j,c,o,f=0,0,d[0],"""#!/usr/bin/python
# -*- coding: UTF-8 -*-
print \"""",False      # set up vars

while (i <= j) and (j < len(d)):   # while the head cursor (j) is in bounds
 if(d[j] == c): j+=1               # if we are in a repetition be gready
 else:
  if c == "\\": c = "\\\\"         # generate repr()s of special case strings
  elif c == "\n": c = "\\n"
  elif c == "\r": c = ""           # bump \n\r
  elif c == "\"": c = "\\\""

  if((j-i)-9 > 0):                 # if the sequence is long enough to justify a repetition
   if(o): o += "\"+"
   o += "\""+c+"\"*{0}+\"".format(j-i)

  else: o += (c*(j-i))             # else just print the sucker
  i = j
  c = d[j]
o += "\""                          # system assumes an open quote at all times, close it

print o

Usage: cat ./tux.txt | python ./encode.py

My Tux - 1009 chars with spaces


                    ..- - .              
                   '        `.           
                  '.- .  .--. .          
                 |: _ | :  _ :|          
                 |`(@)--`.(@) |          
                 : .'     `-, :          
                 :(_____.-'.' `          
                 : `-.__.-'   :          
                 `  _.    _.   .         
                /  /  `_ '  \    .       
               .  :          \\   \      
              .  : _      __  .\   .     
             .  /             : `.  \    
            :  /      '        : `.  .   
           '  `      :          : :  `.  
         .`_ :       :          / '   |  
         :' \ .      :           '__  :  
      .--'   \`-._    .      .' :    `).  
    ..|       \   )          :   '._.'  : 
   ;           \-'.        ..:         / 
   '.           \  - ....-   |        '  
      -.         :   _____   |      .'   
        ` -.    .'--       --`.   .'     
            `--                --        

Compiled Tux - 878 - Python


print " "*20+"..- - .\n"+" "*19+"'"+" "*8+"`.\n"+" "*18+"'.- .  .--. .\n"+" "*17+"|: _ | :  _ :|\n"+" "*17+"|`(@)--`.(@) |\n"+" "*17+": .'"+" "*5+"`-, :\n"+" "*17+":("+"_"*5+".-'.' `\n"+" "*17+": `-.__.-'   :\n"+" "*17+"`  _."+" "*4+"_.   .\n"+" "*16+"/  /  `_ '  \\"+" "*4+".\n"+" "*15+".  :"+" "*10+"\\\\   \\\n"+" "*14+".  : _"+" "*6+"__  .\\   .\n"+" "*13+".  /"+" "*13+": `.  \\\n"+" "*12+":  /"+" "*6+"'"+" "*8+": `.  .\n"+" "*11+"'  `"+" "*6+":"+" "*10+": :  `.\n"+" "*9+".`_ :"+" "*7+":"+" "*10+"/ '   |\n"+" "*9+":' \\ ."+" "*6+":"+" "*11+"'__  :\n"+" "*6+".--'   \\`-._"+" "*4+"."+" "*6+".' :"+" "*4+"`).\n"+" "*4+"..|"+" "*7+"\\   )"+" "*10+":   '._.'  :\n   ;"+" "*11+"\\-'."+" "*8+"..:"+" "*9+"/\n   '."+" "*11+"\\  - "+"."*4+"-   |"+" "*8+"'\n"+" "*6+"-."+" "*9+":   "+"_"*5+"   |"+" "*6+".'\n"+" "*8+"` -."+" "*4+".'--"+" "*7+"--`.   .'\n"+" "*12+"`--"+" "*16+""

Edit 1: Sadly, while this is efficient for large ASCII art files, as with many compression schemes its performance is lackluster to say the least when presented with very small inputs. The "standard" penguin featured thus far "compresses" to 100 chars of Python which while much better than Brainf*ck is still nothing to brag about considering that the penguin itself is a mere 75 chars by my count.

Edit 2: Also as with many compression schemes the incremental gain is minimal. When I tested my compiler against several ASCII files I seem to get a mere 9-10% improvement over the source.

Edit 3: Re: kolmogorov complexity. After some quality time thinking and attempting to golf the base penguin, I have come to the conclusion that the optimal representation is 87 characters long, adding 12 to the actual penguin.

print """    .--.
   |o_o |
   |:_/ |
  //   \ \
 (|     | )
/'|_   _/'\
\___)=(___/"""

The python string repetition syntax for one character, say "a"*5, consumes 4+log10(n) characters in the best case (non-special characters) and 5+log10(n) for characters such as the newline or tab which require the string "\n" not a quoted single char. Therefor to achieve a compression greater than 0%, a character repetition greater than 9 characters in length must exist. (assuming that the repetition is part of a substring which must be terminated with a quote and added to with + and restarted with the same symbols, thus adding 4 to the cost of a repetition) A trivial examination of the penguin above reveals that no such string exists, therefore no compression is possible using string repetition and the best one can do is leverage the tripple quoted string syntax to represent newlines in one char not two.

Edit 4: Fixed edit 3 to reflect the relatively high probability of repetitions internal to some superstring, requiring a repetition like "+'a'*15+" which is of length 6+len(repr(a))-2+log10(n), updated compiler. (the -2 is required because repr() double-quotes everything)

Edit 5: Grabbed the ASCII art collection HERE, and threw together the BASH script below. Pulled a 12% compression average over the entire collection.

#!/bin/bash
l=$(find ./art/ -type f)
len=$(find ./art/ -type f | wc -l)
s=0

for i in $l; do
    chars=$(cat $i | wc -c)
    out=$(cat $i | python ./encode.py | wc -c)
    c=$(calc "1-(($out-42)/$chars)" | sed s/'~'/''/g)
    s=$(calc "$s+$c")
    cat $i | python ./encode.py | python 2> /dev/null
    echo -e "$i, $c\t|$?"
done;
echo $(calc "$s/$len")

Edit 6: Added the UTF-8 specification header to all compiler output and removed those chars from the test script's count because they are spec-required.


Haskell - 14 characters

main=putStr"."

The penguin this program displays is in the far distance.