Build a Nonographic Magnitude Optimizer™

Python 2 & PuLP — 2,644,688 squares (optimally minimized); 10,753,553 squares (optimally maximized)

Minimally golfed to 1152 bytes

from pulp import*
x=0
f=open("c","r")
g=open("s","w")
for k,m in enumerate(f):
 if k%2:
    b=map(int,m.split())
    p=LpProblem("Nn",LpMinimize)
    q=map(str,range(18))
    ir=q[1:18]
    e=LpVariable.dicts("c",(q,q),0,1,LpInteger)
    rs=LpVariable.dicts("rs",(ir,ir),0,1,LpInteger)
    cs=LpVariable.dicts("cs",(ir,ir),0,1,LpInteger)
    p+=sum(e[r][c] for r in q for c in q),""
    for i in q:p+=e["0"][i]==0,"";p+=e[i]["0"]==0,"";p+=e["17"][i]==0,"";p+=e[i]["17"]==0,""
    for o in range(289):i=o/17+1;j=o%17+1;si=str(i);sj=str(j);l=e[si][str(j-1)];ls=rs[si][sj];p+=e[si][sj]<=l+ls,"";p+=e[si][sj]>=l-ls,"";p+=e[si][sj]>=ls-l,"";p+=e[si][sj]<=2-ls-l,"";l=e[str(i-1)][sj];ls=cs[si][sj];p+=e[si][sj]<=l+ls,"";p+=e[si][sj]>=l-ls,"";p+=e[si][sj]>=ls-l,"";p+=e[si][sj]<=2-ls-l,""
    for r,z in enumerate(a):p+=lpSum([rs[str(r+1)][c] for c in ir])==2*z,""
    for c,z in enumerate(b):p+=lpSum([cs[r][str(c+1)] for r in ir])==2*z,""
    p.solve()
    for r in ir:
     for c in ir:g.write(str(int(e[r][c].value()))+" ")
     g.write('\n')
    g.write('%d:%d\n\n'%(-~k/2,value(p.objective)))
    x+=value(p.objective)
 else:a=map(int,m.split())
print x

(NB: the heavily indented lines start with tabs, not spaces.)

Example output: https://drive.google.com/file/d/0B-0NVE9E8UJiX3IyQkJZVk82Vkk/view?usp=sharing

It turns out that problems like are readily convertible to Integer Linear Programs, and I needed a basic problem to learn how to use PuLP—a python interface for a variety of LP solvers—for a project of my own. It also turns out that PuLP is extremely easy to use, and the ungolfed LP builder worked perfectly the first time I tried it.

The two nice things about employing a branch-and-bound IP solver to do the hard work of solving this for me (other than not having to implement a branch and bound solver) are that

  • Purpose-built solvers are really fast. This program solves all 50000 problems in about 17 hours on my relatively low-end home PC. Each instance took from 1-1.5 seconds to solve.
  • They produce guaranteed optimal solutions (or tell you that they failed to do so). Thus, I can be confident that no one will beat my score in squares (although someone might tie it and beat me on the golfing part).

How to use this program

First, you'll need to install PuLP. pip install pulp should do the trick if you have pip installed.

Then, you'll need to put the following in a file called "c": https://drive.google.com/file/d/0B-0NVE9E8UJiNFdmYlk1aV9aYzQ/view?usp=sharing

Then, run this program in any late Python 2 build from the same directory. In less than a day, you'll have a file called "s" which contains 50,000 solved nonogram grids (in readable format), each with the total number of filled squares listed below it.

If you'd like to maximize the number of filled squares instead, change the LpMinimize on line 8 to LpMaximize instead. You will get output very much like this: https://drive.google.com/file/d/0B-0NVE9E8UJiYjJ2bzlvZ0RXcUU/view?usp=sharing

Input format

This program uses a modified input format, since Joe Z. said that we would be allowed to re-encode the input format if we like in a comment on the OP. Click the link above to see what it looks like. It consists of 10000 lines, each containing 16 numbers. The even numbered lines are the magnitudes for the rows of a given instance, while the odd numbered lines are the magnitudes for the columns of the same instance as the line above them. This file was generated by the following program:

from bitqueue import *

with open("nonograms_b64.txt","r") as f:
    with open("nonogram_clues.txt","w") as g:
        for line in f:
            q = BitQueue(line.decode('base64'))
            nonogram = []
            for i in range(256):
                if not i%16: row = []
                row.append(q.nextBit())
                if not -~i%16: nonogram.append(row)
            s=""
            for row in nonogram:
                blocks=0                         #magnitude counter
                for i in range(16):
                    if row[i]==1 and (i==0 or row[i-1]==0): blocks+=1
                s+=str(blocks)+" "
            print >>g, s
            nonogram = map(list, zip(*nonogram)) #transpose the array to make columns rows
            s=""
            for row in nonogram:
                blocks=0
                for i in range(16):
                    if row[i]==1 and (i==0 or row[i-1]==0): blocks+=1
                s+=str(blocks)+" "
            print >>g, s

(This re-encoding program also gave me an extra opportunity to test my custom BitQueue class I created for the same project mentioned above. It is simply a queue to which data can be pushed as sequences of bits OR bytes, and from which data can be popped either a bit or a byte at a time. In this instance, it worked perfectly.)

I re-encoded the input for the specific reason that to build an ILP, the extra information about the grids that were used to generate the magnitudes is perfectly useless. The magnitudes are the only constraints, and so the magnitudes are all I needed access to.

Ungolfed ILP builder

from pulp import *
total = 0
with open("nonogram_clues.txt","r") as f:
    with open("solutions.txt","w") as g:
        for k,line in enumerate(f):
            if k%2:
                colclues=map(int,line.split())
                prob = LpProblem("Nonogram",LpMinimize)
                seq = map(str,range(18))
                rows = seq
                cols = seq
                irows = seq[1:18]
                icols = seq[1:18]
                cells = LpVariable.dicts("cell",(rows,cols),0,1,LpInteger)
                rowseps = LpVariable.dicts("rowsep",(irows,icols),0,1,LpInteger)
                colseps = LpVariable.dicts("colsep",(irows,icols),0,1,LpInteger)
                prob += sum(cells[r][c] for r in rows for c in cols),""
                for i in rows:
                    prob += cells["0"][i] == 0,""
                    prob += cells[i]["0"] == 0,""
                    prob += cells["17"][i] == 0,""
                    prob += cells[i]["17"] == 0,""
                for i in range(1,18):
                    for j in range(1,18):
                        si = str(i); sj = str(j)
                        l = cells[si][str(j-1)]; ls = rowseps[si][sj]
                        prob += cells[si][sj] <= l + ls,""
                        prob += cells[si][sj] >= l - ls,""
                        prob += cells[si][sj] >= ls - l,""
                        prob += cells[si][sj] <= 2 - ls - l,""
                        l = cells[str(i-1)][sj]; ls = colseps[si][sj]
                        prob += cells[si][sj] <= l + ls,""
                        prob += cells[si][sj] >= l - ls,""
                        prob += cells[si][sj] >= ls - l,""
                        prob += cells[si][sj] <= 2 - ls - l,""
                for r,clue in enumerate(rowclues):
                    prob += lpSum([rowseps[str(r+1)][c] for c in icols]) == 2 * clue,""
                for c,clue in enumerate(colclues):
                    prob += lpSum([colseps[r][str(c+1)] for r in irows]) == 2 * clue,""
                prob.solve()
                print "Status for problem %d: "%(-~k/2),LpStatus[prob.status]
                for r in rows[1:18]:
                    for c in cols[1:18]:
                        g.write(str(int(cells[r][c].value()))+" ")
                    g.write('\n')
                g.write('Filled squares for %d: %d\n\n'%(-~k/2,value(prob.objective)))
                total += value(prob.objective)
            else:
                rowclues=map(int,line.split())
print "Total number of filled squares: %d"%total

This is the program that actually produced the "example output" linked above. Hence the extra long strings at the end of each grid, which I truncated when golfing it. (The golfed version should produce identical output, minus the words "Filled squares for ")

How it Works

cells = LpVariable.dicts("cell",(rows,cols),0,1,LpInteger)
rowseps = LpVariable.dicts("rowsep",(irows,icols),0,1,LpInteger)
colseps = LpVariable.dicts("colsep",(irows,icols),0,1,LpInteger)

I use an 18x18 grid, with the center 16x16 part being the actual puzzle solution. cells is this grid. The first line creates 324 binary variables: "cell_0_0", "cell_0_1", and so on. I also create grids of the "spaces" between and around the cells in the solution part of the grid. rowseps points to the 289 variables which symbolize the spaces that separate cells horizontally, while colseps similarly points to variables that mark the spaces that separate cells vertically. Here's a unicode diagram:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|□|0
  - - - - - - - - - - - - - - - - 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The 0s and s are the binary values tracked by the cell variables, the |s are the binary values tracked by the rowsep variables, and the -s are the binary values tracked by the colsep variables.

prob += sum(cells[r][c] for r in rows for c in cols),""

This is the objective function. Just the sum of all the cell variables. Since these are binary variables, this is just exactly the number of filled squares in the solution.

for i in rows:
    prob += cells["0"][i] == 0,""
    prob += cells[i]["0"] == 0,""
    prob += cells["17"][i] == 0,""
    prob += cells[i]["17"] == 0,""

This just sets the cells around the outer edge of the grid to zero (which is why I represented them as zeroes above). This is the most expedient way to track how many "blocks" of cells are filled, since it ensures that every change from unfilled to filled (moving across a column or row) is matched by a corresponding change from filled to unfilled (and vice versa), even if the first or last cell in the row is filled. This is the sole reason for using an 18x18 grid in the first place. It's not the only way to count blocks, but I think it is the simplest.

for i in range(1,18):
    for j in range(1,18):
        si = str(i); sj = str(j)
        l = cells[si][str(j-1)]; ls = rowseps[si][sj]
        prob += cells[si][sj] <= l + ls,""
        prob += cells[si][sj] >= l - ls,""
        prob += cells[si][sj] >= ls - l,""
        prob += cells[si][sj] <= 2 - ls - l,""
        l = cells[str(i-1)][sj]; ls = colseps[si][sj]
        prob += cells[si][sj] <= l + ls,""
        prob += cells[si][sj] >= l - ls,""
        prob += cells[si][sj] >= ls - l,""
        prob += cells[si][sj] <= 2 - ls - l,""

This is the real meat of the logic of the ILP. Basically it requires that each cell (other than those in the first row and column) be the logical xor of the cell and separator directly to its left in its row and directly above it in its column. I got the constraints that simulate an xor within a {0,1} integer program from this wonderful answer: https://cs.stackexchange.com/a/12118/44289

To explain a bit more: this xor constraint makes it so that the separators can be 1 if and only if they lie between cells which are 0 and 1 (marking a change from unfilled to filled or vice versa). Thus, there will be exactly twice as many 1-valued separators in a row or column as the number of blocks in that row or column. In other words, the sum of the separators on a given row or column is exactly twice the magnitude of that row/column. Hence the following constraints:

for r,clue in enumerate(rowclues):
    prob += lpSum([rowseps[str(r+1)][c] for c in icols]) == 2 * clue,""
for c,clue in enumerate(colclues):
    prob += lpSum([colseps[r][str(c+1)] for r in irows]) == 2 * clue,""

And that's pretty much it. The rest just asks the default solver to solve the ILP, then formats the resulting solution as it writes it to the file.


Java, 6,093,092 4,332,656 3,637,260 squares (minimized), 10,567,550 10,567,691 10,568,746 squares (maximized)

Both variants of the program repeatedly perform operations on the source grid, without changing the magnitude.

Minimizer

shrink()

enter image description here

If a black square has 2 white neighbors and 2 black neighbors in a 90° angle, it can be replaced by a white square.

moveLine()

enter image description here enter image description here

In configurations like above the black line can be moved to the right. This is done repeatedly for all 4 line directions clockwise and counterclockwise, to open up new shrink possibilities.

Maximizer

Uncomment the line in main() and comment out the line above it for this version.

grow()

enter image description here

If a white square has 2 white neighbors and 2 black neighbors in a 90° angle, it can be replaced by a black square.

moveLine()

Same as in Minimizer.

Source

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.Arrays;
import java.util.Base64;
import java.util.function.Function;

public class Main {
    private static final int SIZE = 16;
    private static final int SIZE_4 = SIZE + 4;
    private static final int E = 0;
    private static final int N = 1;
    private static final int W = 2;
    private static final int S = 3;

    private static final Base64.Decoder decoder = Base64.getMimeDecoder();
    private static final Base64.Encoder encoder = Base64.getMimeEncoder();
    private static int sourceBlack = 0;
    private static int targetBlack = 0;

    private static class Nonogram {
        private final boolean[] cells = new boolean[SIZE_4 * SIZE_4];
        private final int[] magnitudes;

        public Nonogram(String encoded) {
            super();
            byte[] decoded = decoder.decode(encoded);
            for (int i = 0; i < decoded.length; ++ i) {
                for (int j = 0; j < 8; ++ j) {
                    if ((decoded[i] & (1 << (7 - j))) != 0) {
                        int k = i * 8 + j;
                        cells[getPos(k / SIZE, k % SIZE)] = true;
                        ++ sourceBlack;
                    }
                }
            }
            magnitudes = calcMagnitudes();
        }

        private int getPos(int row, int col) {
            return (row + 2) * SIZE_4 + col + 2;
        }

        private int move(int pos, int dir, int count) {
            switch (dir) {
                case E: return pos + count;
                case N: return pos - count * SIZE_4;
                case W: return pos - count;
                case S: return pos + count * SIZE_4;
                default: return pos;
            }
        }

        private int move(int pos, int dir) {
            return move(pos, dir, 1);
        }

        private int[] calcMagnitudes() {
            int[] result = new int[SIZE * 2];
            for (int row = 0; row < SIZE; ++ row) {
                for (int col = 0; col < SIZE; ++ col) {
                    int pos = getPos(row, col);
                    if (cells[pos]) {
                        if (!cells[move(pos, W)]) {
                            ++ result[row + SIZE];
                        }
                        if (!cells[move(pos, N)]) {
                            ++ result[col];
                        }
                    }
                }
            }
            return result;
        }

        private boolean isBlack(int pos) {
            return cells[pos];
        }

        private boolean isWhite(int pos) {
            return !cells[pos];
        }

        private boolean allBlack(int pos, int dir, int count) {
            int p = pos;
            for (int i = 0; i < count; ++ i) {
                if (isWhite(p)) {
                    return false;
                }
                p = move(p, dir);
            }
            return true;
        }

        private boolean allWhite(int pos, int dir, int count) {
            int p = pos;
            for (int i = 0; i < count; ++ i) {
                if (isBlack(p)) {
                    return false;
                }
                p = move(p, dir);
            }
            return true;
        }

        private int findWhite(int pos, int dir) {
            int count = 0;
            int p = pos;
            while (cells[p]) {
                ++ count;
                p = move(p, dir);
            }
            return count;
        }

        @SafeVarargs
        private final void forEach(Function<Integer, Boolean>... processors) {
            outer:
            for (;;) {
                for (Function<Integer, Boolean> processor : processors) {
                    for (int row = 0; row < SIZE; ++ row) {
                        for (int col = 0; col < SIZE; ++ col) {
                            if (processor.apply(getPos(row, col))) {
                                continue outer;
                            }
                        }
                    }
                }
                return;
            }
        }

        private boolean shrink(int pos) {
            if (cells[pos] && cells[move(pos, W)] != cells[move(pos, E)] &&
                    cells[move(pos, N)] != cells[move(pos, S)]) {
                cells[pos] = false;
                return true;
            }
            return false;
        }

        private boolean grow(int pos) {
            if (!cells[pos] && cells[move(pos, W)] != cells[move(pos, E)] &&
                    cells[move(pos, N)] != cells[move(pos, S)]) {
                cells[pos] = true;
                return true;
            }
            return false;
        }

        private boolean moveLine(boolean clockwise, int dir, int sourcePos) {
            int from = (dir + (clockwise ? 1 : 3)) % 4;
            int to = (dir + (clockwise ? 3 : 1)) % 4;
            int opp = (dir + 2) % 4;
            if (isBlack(sourcePos) && isWhite(move(sourcePos, from)) && isWhite(move(sourcePos, dir))) {
                int toCount = findWhite(move(move(sourcePos, dir), to), to) + 1;
                if (allWhite(move(sourcePos, to), to, toCount + 1)) {
                    int lineCount = 1;
                    int tmpPos = move(sourcePos, opp);
                    while (isBlack(tmpPos) && isWhite(move(tmpPos, from)) && allWhite(move(tmpPos, to),  to, toCount + 1)) {
                        ++ lineCount;
                        tmpPos = move(tmpPos, opp);
                    }
                    if (allBlack(tmpPos, to, toCount + 1)) {
                        tmpPos = sourcePos;
                        for (int j = 0; j < lineCount; ++ j) {
                            cells[tmpPos] = false;
                            cells[move(tmpPos, to, toCount)] = true;
                            tmpPos = move(tmpPos, opp);
                        }
                        return true;
                    }
                }
            }
            return false;
        }

        public Nonogram minimize() {
            for (int i = 0; i < 5; ++ i) {
                forEach(pos -> shrink(pos), pos -> moveLine(true, E, pos), pos -> moveLine(true, N, pos),
                        pos -> moveLine(true, W, pos), pos -> moveLine(true, S, pos));
                forEach(pos -> shrink(pos), pos -> moveLine(false, E, pos), pos -> moveLine(false, N, pos),
                        pos -> moveLine(false, W, pos), pos -> moveLine(false, S, pos));
            }
            return this;
        }

        public Nonogram maximize() {
            for (int i = 0; i < 5; ++ i) {
                forEach(pos -> grow(pos), pos -> moveLine(true, E, pos), pos -> moveLine(true, N, pos),
                        pos -> moveLine(true, W, pos), pos -> moveLine(true, S, pos));
                forEach(pos -> grow(pos), pos -> moveLine(false, E, pos), pos -> moveLine(false, N, pos),
                        pos -> moveLine(false, W, pos), pos -> moveLine(false, S, pos));
            }
            return this;
        }

        public String toBase64() {
            if (!Arrays.equals(magnitudes, calcMagnitudes())) {
                throw new RuntimeException("Something went wrong!");
            }
            byte[] decoded = new byte[SIZE * SIZE / 8];
            for (int i = 0; i < decoded.length; ++ i) {
                for (int j = 0; j < 8; ++ j) {
                    int k = i * 8 + j;
                    if (cells[getPos(k / SIZE, k % SIZE)]) {
                        decoded[i] |= 1 << (7 - j);
                        ++ targetBlack;
                    }
                }
            }
            return encoder.encodeToString(decoded);
        }

        @Override
        public String toString() {
            StringBuilder b = new StringBuilder();
            for (int row = 0; row < SIZE; ++ row) {
                for (int col = 0; col < SIZE; ++ col) {
                    b.append(cells[getPos(row, col)] ? '#' : ' ');
                }
                b.append('\n');
            }
            return b.toString();
        }
    }

    public static void main(String[] args) throws Exception {
        try (BufferedWriter writer = new BufferedWriter(new FileWriter("solutions_b64.txt"));
                BufferedReader reader = new BufferedReader(new FileReader("nonograms_b64.txt"))) {
            String line;
            while ((line = reader.readLine()) != null) {
                writer.write(new Nonogram(line).minimize().toBase64() + "\n");
                //writer.write(new Nonogram(line).maximize().toBase64() + "\n");
            }
        }
        System.out.printf("%d -> %d", sourceBlack, targetBlack);
    }
}