Guess the word (aka Lingo)

Java, 249700 points (beats Chinese Perl Goth in my test)

Updated ranklist:

                        4       5       6       7       8       9       10      11      12      13      Total
perl chinese_perl_goth.pl 6700  12300   16900   19200   23000   26100   28500   29600   32100   33900   228300
java Lingo              9400    14700   18900   21000   26300   28700   30300   32400   33800   34200   249700

Here is the old ranklist using pit.rb:

                        4       5       6       7       8       9       10      11      12      13      Total
ruby player-example.rb  200     400     400     500     1800    1400    1700    1600    3200    4400    15600
ruby player-example2.rb 2700    3200    2500    4300    7300    6300    8200    10400   13300   15000   73200
ruby player-example3.rb 4500    7400    9900    13700   15400   19000   19600   22300   24600   27300   163700
perl chinese_perl_goth.pl 6400  14600   16500   21000   22500   26000   27200   30600   32500   33800   231100
java Lingo              4800    13100   16500   21400   27200   29200   30600   32400   33700   36100   245000

** Rankings **
1: java Lingo (245000)
2: perl chinese_perl_goth.pl (231100)
3: ruby player-example3.rb (163700)
4: ruby player-example2.rb (73200)
5: ruby player-example.rb (15600)

Compared to @chineseperlgoth, I lose in shorter words (< 6 chars) but I win in longer words (>= 6 chars).

The idea is similar to @chineseperlgoth, it's just that my main idea is about finding the guess (can be any word of the same length, not necessarily one of the remaining possibilities) which gives the most information for the next guess.

Currently I'm still playing with the formula, but for the scoreboard above, I choose the word which will yield the minimum for:

-num_confusion * entropy

The latest version uses different scoring to find next best guess, which is maximizing the number of "single possibility" after the current guess. This is done by trying all words in pruned wordlist (to save time) on all possible candidates, and see which guess is more probable to produce "single possibility" (that is, after this guess there will only be one possible answer) for the next guess.

So for example this run:

Starting new round, word is boons
Got:   seora
Sent:  ?XOXX
Got:   topsl
Sent:  XOX?X
Got:   monks
Sent:  XO?XO
Got:   bewig
Sent:  OXXXX
Got:   boons
Sent:  OOOOO
Round won with score 100

From the first three guesses, we already got "*oo*s" with an "n" somewhere and we still need to figure out one more letter. Now the beauty of this algorithm is that instead of guessing words which are similar to that form, it instead guesses the word which has no relation to previous guesses at all, trying to give more letters, hopefully revealing the missing letter. In this case it happens to also get the position for the missing "b" correctly, and concludes with the correct final guess "boons".

Here is the code:

import java.util.*;
import java.io.*;

class Lingo{
    public static String[] guessBestList = new String[]{
                                "",
                                "a",
                                "sa",
                                "tea",
                                "orae",
                                "seora", // 5
                                "ariose",
                                "erasion",
                                "serotina",
                                "tensorial",
                                "psalterion", // 10
                                "ulcerations",
                                "culteranismo",
                                "persecutional"};
    public static HashMap<Integer, ArrayList<String>> wordlist = new HashMap<Integer, ArrayList<String>>();

    public static void main(String[] args){
        readWordlist("wordlist.txt");
        Scanner scanner = new Scanner(System.in);
        int wordlen = Integer.parseInt(args[0]);
        int roundNum = 5;
        ArrayList<String> candidates = new ArrayList<String>();
        candidates.addAll(wordlist.get(wordlen));
        String guess = "";
        while(roundNum-- > 0){
            guess = guessBest(candidates, roundNum==4, roundNum==0);
            System.out.println(guess);
            String response = scanner.nextLine();
            if(isAllO(response)){
                break;
            }
            updateCandidates(candidates, guess, response);
            //print(candidates);
        }
    }

    public static void print(ArrayList<String> candidates){
        for(String str: candidates){
            System.err.println(str);
        }
        System.err.println();
    }

    public static void readWordlist(String path){
        try{
            BufferedReader reader = new BufferedReader(new FileReader(path));
            while(reader.ready()){
                String word = reader.readLine();
                if(!wordlist.containsKey(word.length())){
                    wordlist.put(word.length(), new ArrayList<String>());
                }
                wordlist.get(word.length()).add(word);
            }
        } catch (Exception e){
            System.exit(1);
        }
    }

    public static boolean isAllO(String response){
        for(int i=0; i<response.length(); i++){
            if(response.charAt(i) != 'O') return false;
        }
        return true;
    }

    public static String getResponse(String word, String guess){
        char[] wordChar = word.toCharArray();
        char[] result = new char[word.length()];
        Arrays.fill(result, 'X');
        for(int i=0; i<guess.length(); i++){
            if(guess.charAt(i) == wordChar[i]){
                result[i] = 'O';
                wordChar[i] = '_';
            }
        }
        for(int i=0; i<guess.length(); i++){
            if(result[i] == 'O') continue;
            for(int j=0; j<wordChar.length; j++){
                if(result[j] == 'O') continue;
                if(wordChar[j] == guess.charAt(i)){
                    result[i] = '?';
                    wordChar[j] = '_';
                    break;
                }
            }
        }
        return String.valueOf(result);
    }

    public static void updateCandidates(ArrayList<String> candidates, String guess, String response){
        for(int i=candidates.size()-1; i>=0; i--){
            String candidate = candidates.get(i);
            if(!response.equals(getResponse(candidate, guess))){
                candidates.remove(i);
            }
        }
    }

    public static int countMatchingCandidates(ArrayList<String> candidates, String guess, String response){
        int result = 0;
        for(String candidate: candidates){
            if(response.equals(getResponse(candidate, guess))){
                result++;
            }
        }
        return result;
    }

    public static String[] getSample(ArrayList<String> words, int size){
        String[] result = new String[size];
        int[] indices = new int[words.size()];
        for(int i=0; i<words.size(); i++){
            indices[i] = i;
        }
        Random rand = new Random(System.currentTimeMillis());
        for(int i=0; i<size; i++){
            int take = rand.nextInt(indices.length-i);
            result[i] = words.get(indices[take]);
            indices[take] = indices[indices.length-i-1];
        }
        return result;
    }

    public static String guessBest(ArrayList<String> candidates, boolean firstGuess, boolean lastGuess){
        if(candidates.size() == 1){
            return candidates.get(0);
        }
        String minGuess = candidates.get(0);
        int wordlen = minGuess.length();
        if(firstGuess && guessBestList[wordlen].length()==wordlen){
            return guessBestList[wordlen];
        }
        int minMatches = Integer.MAX_VALUE;
        String[] words;
        if(lastGuess){
            words = candidates.toArray(new String[0]);
        } else if (candidates.size()>10){
            words = bestWords(wordlist.get(wordlen), candidates, 25);
        } else {
            words = wordlist.get(wordlen).toArray(new String[0]);
        }
        for(String guess: words){
            double sumMatches = 0;
            for(String word: candidates){
                int matches = countMatchingCandidates(candidates, guess, getResponse(word, guess));
                if(matches == 0) matches = candidates.size();
                sumMatches += (matches-1)*(matches-1);
            }
            if(sumMatches < minMatches){
                minGuess = guess;
                minMatches = sumMatches;
            }
        }
        return minGuess;
    }

    public static String[] bestWords(ArrayList<String> words, ArrayList<String> candidates, int size){
        int[] charCount = new int[123];
        for(String candidate: candidates){
            for(int i=0; i<candidate.length(); i++){
                charCount[(int)candidate.charAt(i)]++;
            }
        }
        String[] tmp = (String[])words.toArray(new String[0]);
        Arrays.sort(tmp, new WordComparator(charCount));
        String[] result = new String[size+Math.min(size, candidates.size())];
        String[] sampled = getSample(candidates, Math.min(size, candidates.size()));
        for(int i=0; i<size; i++){
            result[i] = tmp[tmp.length-i-1];
            if(i < sampled.length){
                result[size+i] = sampled[i];
            }
        }
        return result;
    }

    static class WordComparator implements Comparator<String>{
        int[] charCount = null;

        public WordComparator(int[] charCount){
            this.charCount = charCount;
        }

        public Integer count(String word){
            int result = 0;
            int[] multiplier = new int[charCount.length];
            Arrays.fill(multiplier, 1);
            for(char chr: word.toCharArray()){
                result += multiplier[(int)chr]*this.charCount[(int)chr];
                multiplier[(int)chr] = 0;
            }
            return Integer.valueOf(result);
        }

        public int compare(String s1, String s2){
            return count(s1).compareTo(count(s2));
        }
    }
}

C++ 267700 Points

A porting from an old MasterMind engine.
Differences from MasterMind:

  • More slots
  • More symbols
  • Bigger solution space (but non so much, because not all symbols combination allowed)
  • The response is much informative, so we have more information after each guess
  • The response is slower to generate and that's a pity because my algorithm have to do it a lot.

The basic idea is choosing the word that minimize the solution space. The algorithm is really slow for the first guess (I mean 'days'), but the best first guess depends only on the word len, so it's hardcoded in the source. The other guesses are done in matter of seconds.

The Code

(Compile with g++ -O3)

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <ctime>
#include <cstdlib>

using namespace std;

class LRTimer
{
private:
    time_t start;
public:
    void startTimer(void)
    {
        time(&start);
    }

    double stopTimer(void)
    {
        return difftime(time(NULL),start);
    } 

};

#define MAX_WORD_LEN 15
#define BIT_QM 0x8000

LRTimer timer;
int size, valid, wordLen;

string firstGuess[] = { "", "a", "as", "iao", "ares", 
    "raise", "sailer", "saltier", "costlier", "clarities", 
    "anthelices", "petulancies", "incarcerates", "allergenicity" };

class Pattern
{
public:
    char letters[MAX_WORD_LEN];
    char flag;
    int mask;

    Pattern() 
        : letters(), mask(), flag()
    {
    }

    Pattern(string word) 
        : letters(), mask(), flag()
    {
        init(word);
    }

    void init(string word)
    {
        const char *wdata = word.data();
        for(int i = 0; i < wordLen; i++) {
            letters[i] = wdata[i];
            mask |= 1 << (wdata[i]-'a');
        }
    }

    string dump()
    {
        return string(letters);
    }

    int check(Pattern &secret)
    {
        if ((mask & secret.mask) == 0)
            return 0;

        char g[MAX_WORD_LEN], s[MAX_WORD_LEN];
        int r = 0, q = 0, i, j, k=99;
        for (i = 0; i < wordLen; i++)
        {
            g[i] = (letters[i] ^ secret.letters[i]);
            if (g[i])
            {
                r += r;
                k = 0;
                g[i] ^= s[i] = secret.letters[i];
            }
            else
            {
                r += r + 1;
                s[i] = 0;
            }
        }
        for (; k < wordLen; k++)
        {
            q += q;
            if (g[k]) 
            {
                for (j = 0; j < wordLen; j++)
                    if (g[k] == s[j])
                    {
                        q |= BIT_QM;
                        s[j] = 0;
                        break;
                    }
            }
        }
        return r|q;
    }

    int count(int ck, int limit);

    int propcheck(int limit);

    void filter(int ck);
};

string dumpScore(int ck)
{
    string result(wordLen, 'X');
    for (int i = wordLen; i--;)
    {
        result[i] = ck & 1 ? 'O' : ck & BIT_QM ? '?' : 'X';
        ck >>= 1;
    }
    return result;
}

int parseScore(string ck)
{
    int result = 0;
    for (int i = 0; i < wordLen; i++)
    {
        result += result + (
            ck[i] == 'O' ? 1 : ck[i] == '?' ? BIT_QM: 0
        );
    }
    return result;
}

Pattern space[100000];

void Pattern::filter(int ck)
{
    int limit = valid, i = limit;
//  cerr << "Filter IN Valid " << setbase(10) << valid << " This " << dump() << "\n"; 
        
    while (i--)
    {
        int cck = check(space[i]);
//      cerr << setbase(10) << setw(8) << i << ' ' << space[i].dump() 
//          << setbase(16) << setw(8) << cck << " (" << Pattern::dumpScore(cck) << ") ";

        if ( ck != cck )
        {
//          cerr << " FAIL\r" ;
            --limit;
            if (i != limit) 
            {
                Pattern t = space[i];
                space[i] = space[limit];
                space[limit] = t;
            }
        }
        else
        {
//          cerr << " PASS\n" ;
        }
    }
    valid = limit;
//  cerr << "\nFilter EX Valid " << setbase(10) << valid << "\n"; 
};

int Pattern::count(int ck, int limit)
{
    int i, num=0;
    for (i = 0; i < valid; ++i)
    {
        if (ck == check(space[i]))
            if (++num >= limit) return num;
    }
    return num;
}

int Pattern::propcheck(int limit)
{
    int k, mv, nv;

    for (k = mv = 0; k < valid; ++k)
    {
        int ck = check(space[k]);
        nv = count(ck, limit);
        if (nv >= limit)
        {
            return 99999;
        }
        if (nv > mv) mv = nv;
    }
    return mv;
}

int proposal(bool last)
{
    int i, minnv = 999999, mv, result;

    for (i = 0; i < valid; i++) 
    {
        Pattern& guess = space[i];
//      cerr << '\r' << setw(6) << i << ' ' << guess.dump();
        if ((mv = guess.propcheck(minnv)) < minnv)
        {
//          cerr << setw(6) << mv << ' ' << setw(7) << setiosflags(ios::fixed) << setprecision(0) << timer.stopTimer() << " s\n";
            minnv = mv;
            result = i;
        }
    }   
    if (last) 
        return result;
    minnv *= 0.75;
    for (; i<size; i++) 
    {
        Pattern& guess = space[i];
//      cerr << '\r' << setw(6) << i << ' ' << guess.dump();
        if ((mv = guess.propcheck(minnv)) < minnv)
        {
//          cerr << setw(6) << mv << ' ' << setw(7) << setiosflags(ios::fixed) << setprecision(0) << timer.stopTimer() << " s\n";
            minnv = mv;
            result = i;
        }
    }   
    return result;
}

void setup(string wordfile)
{
    int i = 0; 
    string word;
    ifstream infile(wordfile.data());
    while(infile >> word)
    {
        if (word.length() == wordLen) {
            space[i++].init(word);
        }
    }
    infile.close(); 
    size = valid = i;
}

int main(int argc, char* argv[])
{
    if (argc < 2) 
    {
        cerr << "Specify word length";
        return 1;
    }

    wordLen = atoi(argv[1]);

    timer.startTimer();
    setup("wordlist.txt");
    //cerr << "Words " << size 
    //  << setiosflags(ios::fixed) << setprecision(2)
    //  << " " << timer.stopTimer() << " s\n";

    valid = size;
    Pattern Guess = firstGuess[wordLen];
    for (int t = 0; t < 5; t++)
    {
        cout << Guess.dump() << '\n' << flush;
        string score;
        cin >> score;
        int ck = parseScore(score);
        //cerr << "\nV" << setw(8) << valid << " #" 
        //  << setw(3) << t << " : " << Guess.dump()
        //  << " : " << score << "\n";
        if (ck == ~(-1 << wordLen))
        {
            break;
        }
        Guess.filter(ck); 
        Guess = space[proposal(t == 3)];
    }
    // cerr << "\n";

    double time = timer.stopTimer();
    //cerr << setiosflags(ios::fixed) << setprecision(2)
    //   << timer.stopTimer() << " s\n";

    return 0;
}

My scores

Evaluation with lingo, 100 rounds:

4   9000
5   17700
6   22000
7   25900
8   28600
9   29700
10  31000
11  32800
12  33500
13  34900

Total 265'100

Self evalueted scores

Here are my average points, scored on the whole wordlist. Not completely relyable because some detail of algorithm has changed during the tests.

 4 # words  6728 PT AVG   100.98 87170.41 s
 5 # words 14847 PT AVG   164.44 42295.38 s
 6 # words 28127 PT AVG   212.27 46550.00 s 
 7 # words 39694 PT AVG   246.16 61505.54 s
 8 # words 49004 PT AVG   273.23 63567.45 s
 9 # words 50655 PT AVG   289.00 45438.70 s
10 # words 43420 PT AVG   302.13 2952.23 s
11 # words 35612 PT AVG   323.62 3835.00 s
12 # words 27669 PT AVG   330.19 5882.98 s
13 # words 19971 PT AVG   339.60 2712.98 s

According to these numbers, my average score should be near 257'800

PIT SCORE

At last I installed Ruby, so now I have an 'official' score:

    4       5       6       7       8       9      10      11      12      13   TOTAL
10700   16300   22000   25700   27400   30300   32000   33800   34700   34800   267700

Perl

There's still some room for improvement but at least it beats the included example players :)

Assumes write access to current directory for caching wordlists (to make it run a bit faster); will create wordlist.lenN.stor files using Storable. If this is an issue, remove read_cached_wordlist and change the code to use just read_wordlist.

Explanation

First, I build a histogram of letter frequencies in all the words in the current wordlist (build_histogram). Then I need to choose my next guess - which is done by find_best_word. The scoring algorithm is just adding the histogram values together, skipping letters already seen. This gives me a word containing the most frequent letters in the wordlist. If there is more than one word with a given score, I choose one randomly. Having found a word, I send it to the game engine, read the reply then try and do something useful with it :)

I maintain a set of conditions, that is, letters that can occur at a given position in a word. On start, this is just simple (['a'..'z'] x $len), but it is updated basing on the hints given in reply (see update_conds). I build a regex out of those conditions then and filter the wordlist through it.

During tests I have found out then that the aforementioned filtering doesn't handle ?s too well, hence the second filter (filter_wordlist_by_reply). This takes advantage of the fact that a letter marked as ? occurs in the word on a different position, and filters the wordlist accordingly.

These steps are repeated for every iteration of the main loop, unless the solution is found (or it is not possible to read from stdin anymore, which means a failure).

Code

#!perl
use strict;
use warnings;
use v5.10;
use Storable;

$|=1;

sub read_wordlist ($) {
    my ($len) = @_;
    open my $w, '<', 'wordlist.txt' or die $!;
    my @wordlist = grep { chomp; length $_ == $len } <$w>;
    close $w;
    \@wordlist
}

sub read_cached_wordlist ($) {
    my ($len) = @_;
    my $stor = "./wordlist.len$len.stor";
    if (-e $stor) {
        retrieve $stor
    } else {
        my $wl = read_wordlist $len;
        store $wl, $stor;
        $wl
    }
}

sub build_histogram ($) {
    my ($wl) = @_;
    my %histo = ();
    for my $word (@$wl) {
        $histo{$_}++ for ($word =~ /./g);
    }
    \%histo
}

sub score_word ($$) {
    my ($word, $histo) = @_;
    my $score = 0;
    my %seen = ();
    for my $l ($word =~ /./g) {
        if (not exists $seen{$l}) {
            $score += $histo->{$l};
            $seen{$l} = 1;
        }
    }
    $score
}

sub find_best_word ($$) {
    my ($wl, $histo) = @_;
    my @found = (sort { $b->[0] <=> $a->[0] } 
                 map [ score_word($_, $histo), $_ ], @$wl);
    return undef unless @found;
    my $maxscore = $found[0]->[0];
    my @max;
    for (@found) {
        last if $_->[0] < $maxscore;
        push @max, $_->[1];
    }
    $max[rand @max]
}

sub build_conds ($) {
    my ($len) = @_;
    my @c;
    push @c, ['a'..'z'] for 1..$len;
    \@c
}

sub get_regex ($) {
    my ($cond) = @_;
    local $" = '';
    my $r = join "", map { "[@$_]" } @$cond;
    qr/^$r$/
}

sub remove_cond ($$$) {
    my ($conds, $pos, $ch) = @_;
    return if (scalar @{$conds->[$pos]} == 1);
    return unless grep { $_ eq $ch } @{$conds->[$pos]};
    $conds->[$pos] = [ grep { $_ ne $ch } @{$conds->[$pos]} ]
}

sub add_cond ($$$) {
    my ($conds, $pos, $ch) = @_;
    return if (scalar @{$conds->[$pos]} == 1);
    return if grep { $_ eq $ch } @{$conds->[$pos]};
    push @{$conds->[$pos]}, $ch
}

sub update_conds ($$$$) {
    my ($word, $reply, $conds, $len) = @_;
    my %Xes;
    %Xes = ();
    for my $pos (reverse 0..$len-1) {
        my $r = substr $reply, $pos, 1;
        my $ch = substr $word, $pos, 1;

        if ($r eq 'O') {
            $conds->[$pos] = [$ch]
        }

        elsif ($r eq '?') {
            for my $a (0..$len-1) {
                if ($a == $pos) {
                    remove_cond $conds, $a, $ch
                } else {
                    unless (exists $Xes{$a} and $Xes{$a} eq $ch) {
                        add_cond($conds, $a, $ch);
                    }
                }
            }
        }

        elsif ($r eq 'X') {
            $Xes{$pos} = $ch;
            for my $a (0..$len-1) {
                remove_cond $conds, $a, $ch
            }
        }
    }
}

sub uniq ($) {
    my ($data) = @_;
    my %seen; 
    [ grep { !$seen{$_}++ } @$data ]
}

sub filter_wordlist_by_reply ($$$) {
    my ($wl, $word, $reply) = @_;
    return $wl unless $reply =~ /\?/;
    my $newwl = [];
    my $len = length $reply;
    for my $pos (0..$len-1) {
        my $r = substr $reply, $pos, 1;
        my $ch = substr $word, $pos, 1;
        next unless $r eq '?';
        for my $a (0..$len-1) {
            if ($a != $pos) {
                if ('O' ne substr $reply, $a, 1) {
                    push @$newwl, grep { $ch eq substr $_, $a, 1 } @$wl
                }
            }
        }
    }
    uniq $newwl
}

my $len = $ARGV[0] or die "no length";
my $wl = read_cached_wordlist $len;
my $conds = build_conds $len;

my $c=0;
do {
    my $histo = build_histogram $wl;
    my $word = find_best_word $wl, $histo;
    die "no candidates" unless defined $word;
    say $word;
    my $reply = <STDIN>; 
    chomp $reply;
    exit 1 unless length $reply;
    exit 0 if $reply =~ /^O+$/;
    update_conds $word, $reply, $conds, $len;
    $wl = filter_wordlist_by_reply $wl, $word, $reply;
    $wl = [ grep { $_ =~ get_regex $conds } @$wl ]
} while 1