Calculating a random "blob" in a 10 x 10 grid

A definitely correct but inefficient method is to

  1. Randomly choose which of the $100$ cells are occupied, independently with a probability of $\frac12$ for each cell.
  2. Check if there is a single blob. If not, start over. If so, output this blob.

Step 1 is equally likely to generate each blob (as well as all the non-blobs), so the result (once a blob finally is generated) is equally likely to be any blob. More precisely, let $f$ be the probability that we get something that isn't a blob. Then the probability of generating your favorite blob $B$ on the $k^{\text{th}}$ trial is $f^{k-1} \cdot \frac{1}{2^{100}}$: we failed to get a blob $k-1$ times, and then finally got all the cells in $B$ and no others. So the overall probability of getting blob $B$ with this method is $\frac{2^{-100}}{1-f}$, which doesn't depend on $B$, and therefore all blobs are equally likely.

We choose a probability of $\frac12$ so that on each trial, any blob $B$ has the same probability of $2^{-100}$ of being generated. If the probability were $p \ne \frac12$, then the probability of generating a $k$-celled blob would be $p^k (1-p)^{100-k}$, which is not uniform.

Unfortunately, it will probably take a very long time to generate a blob with this method, since a random subset of the cells usually doesn't form a connected blob.


Next, here is an actually feasible method.

First, consider the following sampling method, which is not uniform:

  1. Randomly choose which of the $100$ cells are occupied, independently with a probability of $\frac12$ for each cell.
  2. Choose a cell uniformly at random. If it is not occupied (or if it has no occupied neighbors, so it doesn't count as a blob), start over. Otherwise, let $B$ be the connected component containing the chosen cell.

Even though this method is definitely biased, it has the nice feature that we know exactly the probability with which a blob $B$ is returned in a single trial. That probability is $$ p(B) = \frac{|B|}{100}\left(\frac12\right)^{|B|+|\partial B|} $$ where $|B|$ is the number of occupied cells in $B$ and $|\partial B|$ is the number of unoccupied cells bordering $B$. Moreover, we can put a lower bound on $p(B)$ for any $B$: it is $p^* = \frac{33}{100} (\frac12)^{100}$. (This is the minimum of $\frac{x}{100} (\frac12)^{x+y}$ over all $x,y$ with $x\ge2$, $y \ge 0$, $x+y \le 100$, and $y \le 2x+2$: the constraints on $|B|$ and $|\partial B|$.)

Now modify the method above to make the following method:

  1. Randomly choose which of the $100$ cells are occupied, independently with a probability of $\frac12$ for each cell.
  2. Choose a cell uniformly at random. If it is not occupied (or if it has no occupied neighbors, so it doesn't count as a blob), start over. Otherwise, let $B$ be the connected component containing the chosen cell.
  3. With probability $p^*/p(B)$, return $B$. Otherwise, start over.

Now, on any given trial, the probability that $B$ is the blob we get after step 2 is is $p(B)$. But the probability that we actually return $B$ is that probability times the acceptance probability: $p(B) \cdot p^*/p(B)$, or $p^*$. So on any given trial, every blob has the same chance $p^*$ of being produced.

Experimentally, the average value of $\frac{p^*}{p(B)}$ seems to be between $0.001$ and $0.0001$, which means the average number of times steps 1-2 are repeated is not $2^{100}$ (as with the first method) but fewer than $10000$. This means we can actually run this algorithm in a few seconds as opposed to the lifetime of the universe.


First, one method that immediately comes to mind is to randomly assigning all 100 cells to be occupied or non-occupied, and then just keep doing this until you get a pattern that is an actual blob.

EDIT ... I should have stopped there ...

Unfortunately, this will bias things towards blobs of a size proportional to the percentage with which you make the cells black or white. For example, if you turn a square black or white with a chance of $50$%, then of course, you'd get almost all blobs with a size of around $50$.

EDIT: Not true. By setting the percentage of turning a cell to black or white to $50$%, the likelihood of each specific blob being the outcome of this process is the same as for any other specific blob.

So, you really want to avoid that bias ... and note that the method described in the post has a very similar bias, in how the size of the resulting blobs very much depends on the percentage your algorithm uses.

So, what to do? Can we calculate how many blobs there are of a certain size? Well, the number of possible blobs is of course really big. And I don't even see a simple formula for calculating an exact number or, for that matter, how many there are of a certain size. So I don't think it will be practical either to exactly pre-calculate these kinds of numbers so that you could randomly pick blobs of a certain size (let alone shape) proportional to how many there are of that size (or shape)

So ... I think you'll have to do a more statistical method. So, for example, you could try to get some kind of estimation for how many blobs there are of various sizes by doing the following:

For a blob of size $n$: Randomly turn $n$ of the $100$ squares black or white, and see if you get a blob. Do this a large (but still practical) number of times, say $10000$, or maybe even $100000$ and just count how many times you get a blob. Do this for all $n$, and that should get you a rough idea for the proportions of blobs of a certain size, e.g if for $n=3$ you obtained a blob $20$ times out of $10000$, but for $n=40$ you got a blob $200$ times out of $10000$, then there should roughly be $10$ times as many blobs of size $40$ then blobs of size $3$, and that is because each specific blob (of whatever size, shape, or orientation) will be chosen with the exact same likelihood as any other blob.

(By the way, for some $n$ you will hardly get any blobs at all, e.g. I would not be surprised if you get $0$ blobs out of $100000$, or even $1000000$, for $n=20$ ... but that's ok: that just means that the number of blobs of size $20$, out of the whole space of possible blobs, is really, really small ... and so statistically you may as well set its proportion to $0$ relatively to other sizes. Also, for large $n$, say $n>80$, the chances of getting a blob become really good, so you can probably get away with just generating $100$ random square assignments. )

Once you have those proportions, then generate a random blob by first picking an $n$ relative to those proportions, and then randomly picking $n$ squares out of the $100$, and just keep repeating that until you get a blob.