Find Possible Word Rectangles

Python, 232 chars

x,y=input()
H=[]
P={}
for w in open('words.txt'):
 l=len(w)-2
 if l==x:H+=[w]
 if l==y:
  for i in range(y+1):P[w[:i]]=1
def B(s):
 if(x+2)*y-len(s):[B(s+w)for w in H if all((s+w)[i::x+2]in P for i in range(x))]
 else:print s
B('')

It can only handle up to 6x6 in the 1/2 hour limit, though.


Haskell, 586 characters

import Data.List
import qualified Data.Vector as V
import System
data P=P[String](V.Vector P)
e=P[]$V.replicate 128 e
(a:z)∈(P w v)|a>' '=z∈(V.!)v(fromEnum a);_∈(P w _)=w
p∫w=q p w where q(P u v)=P(w:u).V.accum q v.x;x(a:z)=[(fromEnum a,z)];x _=[]
l%n=foldl'(∫)e.filter((==n).length)$l
d§(p,q:r)=map(\w->p++w:r)$q∈d;_§(p,_)=[p]
(d¶e)i=filter(not.any(null.(∈e))).map transpose.(d§).splitAt i
s i n d e m|i==n=["":m]|1<3=(d¶e)i m>>=(e¶d)i>>=s(i+1)n d e
p[b,a,n]w=take n$s 0(a`max`b)(w%a)(w%b)$replicate b$replicate a ' '
main=do{g<-map read`fmap`getArgs;interact$unlines.concat.p g.lines}

Invoked by supplying 3 arguments: number of rows, number of columns, number of solutions; and the word list is accepted on stdin:

$> ghc -O3 2554-WordRect.hs 
[1 of 1] Compiling Main             ( 2554-WordRect.hs, 2554-WordRect.o )
Linking 2554-WordRect ...

$> time ./2554-WordRect 7 7 1 < 2554-words.txt

zosters
overlet
seriema
trimmer
element
remends
startsy

real    0m22.381s
user    0m22.094s
sys     0m0.223s

As you can see 7×7 runs relatively fast. Still timing 8×8 and 7×6....

It would be 9 characters shorter to remove the number of solutions argument and just produce all solutions, but then it becomes impossible to time.


  • Edit: (585 → 455) replaced custom data structure with a simple map of prefix string to possible replacements; oddly, this is a bit slower, perhaps because Map String a is slower than a hand-built tree of Map Char a...
  • Edit: (455 → 586) Bigger?!?!! This version does more optimization of the search space, using both the techniques of my original solution, and of the python and awk solutions. Further, the custom data structure, based on Vector is much faster than using a simple Map. Why do this? Because I'm think a solution that is closer to the goal of 8x8 in under ½ hour is preferable to a shorter solution.

Java (1065 bytes)

import java.util.*;public class W{public static void main(String[]a){new
W(Integer.parseInt(a[0]),Integer.parseInt(a[1]));}W(int w,int h){M
H=new M(),V=new M();String L;int i,j,l,m,n=w*h,p[]=new int[n];long
I,J,K,M,C=31;long[]G=new long[h],T=new long[w],W[]=new long[n][],X;try{Scanner
S=new Scanner(new java.io.File("words.txt"));while(0<1){L=S.nextLine();l=L.length();for(i=0;i>>l<1;i++){K=0;for(j=0;j<l;j++)K+=(i>>j&1)*(L.charAt(j)-96L)<<5*j;if(l==w)H.put(K,H.g(K)+1);if(l==h)V.put(K,V.g(K)+1);}}}catch(Exception
E){}while(n-->0){j=1;if(W[n]==null){M=1L<<62;for(i=w*h;i-->0;){m=i/w;l=i%w*5;if((G[m]>>l&C)<1){X=new
long[27];I=K=0;for(;K++<26;){J=H.g(G[m]+(K<<l))*V.g(T[i%w]+(K<<5*m));X[(int)K]=K-32*J;I+=J;}if(I<1)j=0;if(I<M){M=I;p[n]=i;W[n]=X;}}}}X=W[n];Arrays.sort(X);M=X[0]*j;X[0]=0;K=M&C;i=p[n]%w;j=p[n]/w;l=5*i;m=5*j;G[j]&=~(C<<l);G[j]+=K<<l;T[i]&=~(C<<m);T[i]+=K<<m;if(M>=0){W[n]=null;n+=2;}}for(long
A:G){L="";for(i=0;i<w;)L+=(char)(96+(C&A>>5*i++));System.out.println(L);}}class
M extends HashMap<Long,Long>{long g(Long s){return get(s)!=null?get(s):0;}}}

A long way from being the shortest, but I think it's the closest to meeting the timing constraints. I saved 14 bytes by assuming that the input file has been filtered to words of the right lengths; on my netbook, if you feed the whole words.txt then it spends the first minute preprocessing it, discarding most of what it produces, and then takes a mere 20 or so seconds to solve 7x7. On my desktop it does the whole thing in under 15 seconds, giving:

rascals
areolae
serrate
coroner
alanine
latents
seeress

I've let it run for over 50 hours without finding a solution to 8x7 or 8x8. 8-letter words seem to be a critical boundary for this problem - it just hovers around half-full without making much progress.

The approach used is full pivoting and a heuristic based on the number of possible horizontal completions times the number of possible vertical completions. E.g. if we have intermediate grid

*ean*
algae
*ar**
*ier*
*nee*

then we give the top-left corner a heuristic value of count(aean*)count(aa***) + count(bean*)count(ba***) + ... + count(zean*)count(za***). Of all the cells we pick the one with the smallest heuristic value (i.e. hardest to satisfy), and then work though the letters in descending order of the amount they contributed to the heuristic value of that cell (i.e. starting with the most likely to succeed).