Keep the unique characters down

CJam, 266 281 456 bytes * 14 12 7 unique = 3724 3372 3192

Try it online.

14301201124204202034420112034224204431020210101232301021240204310431312122132100240400222324402030223103420431324402222132223233141443401210314023001122320404112224314302132421403301243334313000011124244441400003310332301330220022110121411122100040310110020040121444302100143202204330334033211334242120304123121024200421121232100303121022431044444423243331440434010014400~~10~~100~~1~43c~c~4~~41c~100~~1~43c~c~123~~100~~1~43c~c~44~~14~~43c~c100~~40c~c43c~~

Explanation

The strategy I've used is to treat each character in the string as a base-123 digit and encode that as a decimal number in the program. The program then converts that number back to base 123 and maps each base-123 digit back to a character. Because it's hard to explain why the program is in its current state, I'll explain each version of it.

Here's what the end of the program looked like in the first version:

...2068438725 123b:c

This implements the strategy in the most straightforward way possible. The number, encoded normally in base 10, is converted back to base 123 and each base-123 digit is mapped back to a character. But this uses 4 unique non-digit characters, and being able to get rid of any one of them would likely be worth the size hit due to having to use less straightforward code.

First, I realized that I could get rid of the b and the : operators by creating them at runtime as their ASCII character values converted back to a character (with the already present c operator) and evaluating them with the ~ operator. It turned out to be a little tricky to do this with the : operator, since it has to be parsed together with the following c operator. I solved this by producing the characters : and c and then producing and evaluating the character +, which concatenates the former two characters into the string :c which can then be evaluated properly.

Second, I realized that the ~ operator I just introduced had a handy new overloaded variant: when given a number, it produces the bitwise complement. By using this twice in a row after a number, I could introduce a token break in the source with no resultant computational effect, allowing me to replace the spaces used to separate numbers with ~~.

The final result is 15 more bytes of code at the end, but this cost is greatly outweighed by the benefit of eliminating 2 unique characters out of 14. Here's a comparison of the end of the first version with the end of the second version:

...2068438725  123[ b  ][   :c    ]
...2068438725~~123~~98c~58c99c43c~~

Using any fewer than the 2 operators I was using would be impossible, but I still wanted fewer unique characters. So the next step was to eliminate digits. By changing the number's encoding so that each decimal digit was really a base-5 digit, I could potentially eliminate the digits 6-9. Before eliminating anything from the end of the prgoram, it looked like this:

...4010014400 10b5b123b:c

As mentioned before, eliminating the space is easy. But the b, :, and c would not be so easy, as their character codes are 98, 58, and 99, respectively. These all contained digits marked for elimination, so I had to find ways to derive them all. And the only useful numeric operators with character values not containing 5-9 were decrement, increment, multiply, and add.

For 98, I initially used 100~~40c~40c~, which decrements 100 twice. But then I realized I could make yet another use of the ~ operator, as bitwise complement lets me get negative numbers which, when added, let me emulate subtraction. So I then used 100~~1~43c~, which adds 100 and -2 and is 2 bytes smaller. For 58, I used 44~~14~~43c~, which adds 44 and 14. And for 99, I used 100~~40c~, which decrements 100.

The final result is pretty big and obfuscated, but the cost of the significantly larger number and processing code were slightly outweighed by the big benefit of eliminating 5 unique characters out of 12. Here's a comparison of the final end of the program before eliminations and after eliminations:

...4010014400  10[      b      ][  5  ][     b     ]123[      b      ][            :c            ]
...4010014400~~10~~100~~1~43c~c~4~~41c~100~~1~43c~c~123~~100~~1~43c~c~44~~14~~43c~c100~~40c~c43c~~

Whitespace, 1157 937 bytes * 3 unique = 3471 2811

By popular(?) request, I'm posting my whitespace solution.

In order to reduce the code needed, I hardcoded the entire string as one binary number (7 bits for each byte). A simple loop extracts the characters and prints them.

Source code on filebin.ca.

NOTE: The specifications allow for arbitrary large integers, but the Haskell interpreter on the official page is limited to 20 bits. Use, for example, this ruby interpreter on github/hostilefork/whitespaces.

The ruby script to create the whitespace program (l=WHITESPACE, t=TAB, u=NEWLINE, everything after // ignored, writes to a file prog.h):

str = 'Elizabeth obnoxiously quoted (just too rowdy for my peace): "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG," giving me a look.'

def b2ws(bin)
    bin.chars.map{|x|x=='0' ? "l" : "t"}.join
end
def d2ws(dec,pad=0)
    b2ws(dec.to_s(2).rjust(pad,'0'))
end

bin = str.reverse.chars.map do |x|
    "  " + d2ws(x.getbyte(0),7) + " // char <#{x=="\n" ? "LF" : x}>"
end
data = "lll // pushes string as one large number\n#{bin.join("\n")}\nu"

File.open('prog.h','w') do |fout|
    fout << data << "\n"
    fout << <<-eos

// output string loop
ulllu // label 0

  // extract, print, and remove the last 7 bits of the data
  lul   // dup
  llltlllllllu // push 128
  tltt // mod
  tull  // print as char
  llltlllllllu // push 128
  tltl // div

  // break loop if EOS
  lul   // dup
  utltu // jump to 1 if top of stack is zero

ululu // jump to 0

ulltu // label 1
uuu
    eos
end

For illustration, the whitespace program in human-readable form. See below for a simple script to convert it to an actual whitespace program.

lll // pushes string as one large number
  ltltttl // char <.>
  ttltltt // char <k>
  ttltttt // char <o>
  ttltttt // char <o>
  ttlttll // char <l>
  ltlllll // char < >
  ttllllt // char <a>
  ltlllll // char < >
  ttlltlt // char <e>
  ttlttlt // char <m>
  ltlllll // char < >
  ttllttt // char <g>
  ttltttl // char <n>
  ttltllt // char <i>
  tttlttl // char <v>
  ttltllt // char <i>
  ttllttt // char <g>
  ltlllll // char < >
  ltllltl // char <">
  ltlttll // char <,>
  tlllttt // char <G>
  tlltttt // char <O>
  tllltll // char <D>
  ltlllll // char < >
  tlttllt // char <Y>
  tlttltl // char <Z>
  tlllllt // char <A>
  tllttll // char <L>
  ltlllll // char < >
  tllltlt // char <E>
  tlltlll // char <H>
  tltltll // char <T>
  ltlllll // char < >
  tltlltl // char <R>
  tllltlt // char <E>
  tltlttl // char <V>
  tlltttt // char <O>
  ltlllll // char < >
  tltlltt // char <S>
  tltllll // char <P>
  tllttlt // char <M>
  tltltlt // char <U>
  tlltltl // char <J>
  ltlllll // char < >
  tlttlll // char <X>
  tlltttt // char <O>
  tlllttl // char <F>
  ltlllll // char < >
  tlltttl // char <N>
  tltlttt // char <W>
  tlltttt // char <O>
  tltlltl // char <R>
  tlllltl // char <B>
  ltlllll // char < >
  tlltltt // char <K>
  tlllltt // char <C>
  tlltllt // char <I>
  tltltlt // char <U>
  tltlllt // char <Q>
  ltlllll // char < >
  tllltlt // char <E>
  tlltlll // char <H>
  tltltll // char <T>
  ltllltl // char <">
  ltlllll // char < >
  ltttltl // char <:>
  ltltllt // char <)>
  ttlltlt // char <e>
  ttllltt // char <c>
  ttllllt // char <a>
  ttlltlt // char <e>
  tttllll // char <p>
  ltlllll // char < >
  ttttllt // char <y>
  ttlttlt // char <m>
  ltlllll // char < >
  tttlltl // char <r>
  ttltttt // char <o>
  ttllttl // char <f>
  ltlllll // char < >
  ttttllt // char <y>
  ttlltll // char <d>
  tttlttt // char <w>
  ttltttt // char <o>
  tttlltl // char <r>
  ltlllll // char < >
  ttltttt // char <o>
  ttltttt // char <o>
  tttltll // char <t>
  ltlllll // char < >
  tttltll // char <t>
  tttlltt // char <s>
  tttltlt // char <u>
  ttltltl // char <j>
  ltltlll // char <(>
  ltlllll // char < >
  ttlltll // char <d>
  ttlltlt // char <e>
  tttltll // char <t>
  ttltttt // char <o>
  tttltlt // char <u>
  tttlllt // char <q>
  ltlllll // char < >
  ttttllt // char <y>
  ttlttll // char <l>
  tttlltt // char <s>
  tttltlt // char <u>
  ttltttt // char <o>
  ttltllt // char <i>
  ttttlll // char <x>
  ttltttt // char <o>
  ttltttl // char <n>
  ttllltl // char <b>
  ttltttt // char <o>
  ltlllll // char < >
  ttltlll // char <h>
  tttltll // char <t>
  ttlltlt // char <e>
  ttllltl // char <b>
  ttllllt // char <a>
  ttttltl // char <z>
  ttltllt // char <i>
  ttlttll // char <l>
  tllltlt // char <E>
u

// output string loop
ulllu // label 0

  // extract, print, and remove the last 7 bits of the data
  lul   // dup
  llltlllllllu // push 128
  tltt // mod
  tull  // print as char
  llltlllllllu // push 128
  tltl // div

  // break loop if EOS
  lul   // dup
  utltu // jump to 1 if top of stack is zero

ululu // jump to 0

ulltu // label 1
uuu

Basically, the string to output is a long integer, and you need to reduce its score.

Take a number x, and convert it to base b. Its length will be floor(log_b(x)+1), and it will contain b different symbols. So the score is b*floor(log_b(x)+1). x is a given large number, and if you plot this for b, you'll find the minimum is pretty much at b=3 (and b=2 is almost as good). Ie, the length reduces slightly as you use higher bases (log), but size of the charset increases linearly, so it isn't worth it.

Thus I looked for a language with only 0/1's, but I didn't find any, and then I remembered there was whitespace and tried it. In whitespace, you can enter binary numbers with 0's and 1's directly.


Old code, worse score but more interesting

Old code on filebin.

The ruby script I used for creating the program (l=WHITESPACE, t=TAB, u=NEWLINE, everything after // ignored, writes to a file prog.h):

str = 'Elizabeth obnoxiously quoted (just too rowdy for my peace): "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG," giving me a look.' + "\n"

def shift(x)
    (x-97)%128
end

EOS = "lttltl" #26
STACK = []

bin = str.reverse.chars.map do |x|
    byte = shift(x.getbyte(0))
    rf = STACK.index(byte)
    STACK.unshift(byte)

    y = byte.to_s(2).chars.map{|y|y=='0' ? 'l' : 't'}.join
    ph = "lll#{y}u" # pushing directly
    if rf
        bn = rf.to_s(2).chars.map{|y|y=='0' ? 'l' : 't'}.join
        cp = "ltll#{bn}u" # copying from stack
    end

    if STACK.size>0 && STACK[0]==STACK[1]
        "lul // dup #{x.inspect}"
    elsif cp && cp.size < ph.size
        "#{cp} // copy <#{x.inspect}> (from #{rf})"
    else
        "#{ph} // push <#{x.inspect}> (#{shift(x.getbyte(0))})"
    end
end

File.open('prog.h','w') do |fout|
    fout << "ll#{EOS}u // push EOS" << "\n"
    fout << bin.join("\n") << "\n"
    fout << <<-eos
// output string
ullu // label 0
  // add 97 (128) before printing
  lllttlllltu // push 97
  tlll // add
  llltlllllllu // push 128
  tltt // mod
  tull  // print top of stack

  // break loop if EOS
  lul   // dup
  ll#{EOS}u // push EOS
  tllt // subtract
  utltu // jump to 1 if top of stack is zero
uluu // jump to 0

ulltu // label 1
uuu
    eos
end

For illustration, the whitespace program in human-readable form. See below for a simple script to convert it to an actual whitespace program.

lllttltlu // push EOS
llltltlltu // push <"\n"> (41)
llltllttltu // push <"."> (77)
llltltlu // push <"k"> (10)
llltttlu // push <"o"> (14)
lul // dup "o"
llltlttu // push <"l"> (11)
lllttttttu // push <" "> (63)
llllu // push <"a"> (0)
ltlltu // copy <" "> (from 1)
llltllu // push <"e"> (4)
lllttllu // push <"m"> (12)
ltlltlu // copy <" "> (from 2)
lllttlu // push <"g"> (6)
lllttltu // push <"n"> (13)
llltlllu // push <"i"> (8)
llltltltu // push <"v"> (21)
ltlltu // copy <"i"> (from 1)
lllttlu // push <"g"> (6)
ltllttlu // copy <" "> (from 6)
llltllllltu // push <"\""> (65)
llltlltlttu // push <","> (75)
lllttllttlu // push <"G"> (102)
lllttltttlu // push <"O"> (110)
lllttlllttu // push <"D"> (99)
ltlltltu // copy <" "> (from 5)
lllttttlllu // push <"Y"> (120)
lllttttlltu // push <"Z"> (121)
lllttlllllu // push <"A"> (96)
lllttltlttu // push <"L"> (107)
ltlltllu // copy <" "> (from 4)
lllttlltllu // push <"E"> (100)
lllttlltttu // push <"H"> (103)
llltttllttu // push <"T"> (115)
ltllttu // copy <" "> (from 3)
llltttllltu // push <"R"> (113)
ltlltllu // copy <"E"> (from 4)
llltttltltu // push <"V"> (117)
ltlltttlu // copy <"O"> (from 14)
ltlltllu // copy <" "> (from 4)
llltttlltlu // push <"S"> (114)
lllttlttttu // push <"P"> (111)
lllttlttllu // push <"M"> (108)
llltttltllu // push <"U"> (116)
lllttltlltu // push <"J"> (105)
ltlltltu // copy <" "> (from 5)
llltttltttu // push <"X"> (119)
ltlltlllu // copy <"O"> (from 8)
lllttlltltu // push <"F"> (101)
ltllttu // copy <" "> (from 3)
lllttlttltu // push <"N"> (109)
llltttlttlu // push <"W"> (118)
ltlltllu // copy <"O"> (from 4)
ltlltllltu // copy <"R"> (from 17)
lllttlllltu // push <"B"> (97)
ltlltltu // copy <" "> (from 5)
lllttltltlu // push <"K"> (106)
lllttllltlu // push <"C"> (98)
lllttltlllu // push <"I"> (104)
ltllttttu // copy <"U"> (from 15)
llltttllllu // push <"Q"> (112)
ltlltltu // copy <" "> (from 5)
ltllttlltu // copy <"E"> (from 25)
ltllttttlu // copy <"H"> (from 30)
ltllttttlu // copy <"T"> (from 30)
llltllllltu // push <"\""> (65)
ltlltllu // copy <" "> (from 4)
llltlttlltu // push <":"> (89)
llltlltlllu // push <")"> (72)
llltllu // push <"e"> (4)
llltlu // push <"c"> (2)
llllu // push <"a"> (0)
llltllu // push <"e"> (4)
lllttttu // push <"p"> (15)
ltlltttu // copy <" "> (from 7)
lllttlllu // push <"y"> (24)
lllttllu // push <"m"> (12)
ltlltlu // copy <" "> (from 2)
llltllltu // push <"r"> (17)
llltttlu // push <"o"> (14)
llltltu // push <"f"> (5)
ltllttu // copy <" "> (from 3)
ltllttlu // copy <"y"> (from 6)
lllttu // push <"d"> (3)
llltlttlu // push <"w"> (22)
llltttlu // push <"o"> (14)
ltlltttu // copy <"r"> (from 7)
ltlltltu // copy <" "> (from 5)
ltlltlu // copy <"o"> (from 2)
lul // dup "o"
llltllttu // push <"t"> (19)
ltllttu // copy <" "> (from 3)
ltlltu // copy <"t"> (from 1)
llltlltlu // push <"s"> (18)
llltltllu // push <"u"> (20)
llltlltu // push <"j"> (9)
llltllltttu // push <"("> (71)
ltlltltu // copy <" "> (from 5)
lllttu // push <"d"> (3)
llltllu // push <"e"> (4)
ltlltttu // copy <"t"> (from 7)
llltttlu // push <"o"> (14)
ltlltttu // copy <"u"> (from 7)
llltllllu // push <"q"> (16)
ltllttlu // copy <" "> (from 6)
lllttlllu // push <"y"> (24)
llltlttu // push <"l"> (11)
llltlltlu // push <"s"> (18)
ltlltltu // copy <"u"> (from 5)
llltttlu // push <"o"> (14)
llltlllu // push <"i"> (8)
llltltttu // push <"x"> (23)
ltlltlu // copy <"o"> (from 2)
lllttltu // push <"n"> (13)
llltu // push <"b"> (1)
ltlltlu // copy <"o"> (from 2)
ltlltlttu // copy <" "> (from 11)
llltttu // push <"h"> (7)
llltllttu // push <"t"> (19)
llltllu // push <"e"> (4)
llltu // push <"b"> (1)
llllu // push <"a"> (0)
lllttlltu // push <"z"> (25)
llltlllu // push <"i"> (8)
llltlttu // push <"l"> (11)
lllttlltllu // push <"E"> (100)
// output string
ullu // label 0
  // add 97 (128) before printing
  lllttlllltu // push 97
  tlll // add
  llltlllllllu // push 128
  tltt // mod
  tull  // print top of stack

  // break loop if EOS
  lul   // dup
  lllttltlu // push EOS
  tllt // subtract
  utltu // jump to 1 if top of stack is zero
uluu // jump to 0

ulltu // label 1
uuu

This whitespace program itself is rather simple, but there are three golfing optimizations:

  • use lul to clone the stack when there's a duplicate character
  • use ltl to clone the n-th entry of the stack if its shorter than pushing the char directly
  • shift down all bytes by 97 (mod 128), makes the binary numbers smaller

A simple ruby script to convert my human readable whitespace code to an actual whitespace program (read a file prog.h and writes to a file prog.ws):

WHITESPACE = "l"
NEWLINE = "u"
TAB = "t"

File.open('prog.ws','w') do |fout|
    code = ""
    fin = File.read('prog.h')
    fin.each_line do |line|
        line.gsub!(/\/\/.*/,'')
        line.scan(/#{NEWLINE}|#{WHITESPACE}|#{TAB}/i) do |x|

            code << case x.downcase
            when NEWLINE.downcase
                "\n"
            when WHITESPACE.downcase
                " "
            when TAB.downcase
                "\t"
            else
                ""
            end
        end
    end
    fout << code
end

Ruby 144 Bytes * 39 Unique = 5616

puts"Elizabeth obnoxiously quoted (just too rowdy for my peace): \"#{'the quick brown fox jumps over the lazy dog,'.upcase}\" giving me a look."

Sometimes the simplest is the best.