How can I find hole in a 2D matrix?

Make some sort of a "LinkedCells" class that will store cells that are linked with each other. Then check cells on-by-one in a from-left-to-right-from-top-to-bottom order, making the following check for each cell: if it's neighbouring cell is black - add this cell to that cell's group. Else you should create new group for this cell. You should only check for top and left neighbour.
UPD: Sorry, I forgot about merging groups: if both neighbouring cells are black and are from different groups - you should merege tha groups in one.

Your "LinkedCells" class should have a flag if it is connected to the edge. It is false by default and can be changed to true if you add edge cell to this group. In case of merging two groups you should set new flag as a || of previous flags. In the end you will have a set of groups and each group having false connection flag will be "hole".

This algorithm will be O(x*y).


You can represent the grid as a graph with individual cells as vertexes and edges occurring between adjacent vertexes. Then you can use Breadth First Search or Depth First Search to start at each of the cells, on the sides. As you will only find the components connected to the sides, the black cells which have not been visited are the holes. You can use the search algorithm again to divide the holes into distinct components.

EDIT: Worst case complexity must be linear to the number of cells, otherwise, give some input to the algorithm, check which cells (as you're sublinear, there will be big unvisited spots) the algorithm hasn't looked into and put a hole in there. Now you've got an input for which the algorithm doesn't find one of the holes.


Your algorithm is globally Ok. It's just a matter of optimizing it by merging the flood fill exploration with the cell scanning. This will just minimize tests.

The general idea is to perform the flood fill exploration line by line while scanning the table. So you'll have multiple parallel flood fill that you have to keep track of.

The table is then processed row by row from top to bottom, and each row processed from right to left. The order is arbitrary, could be reverse if you prefer.

Let segments identify a sequence of consecutive cells with value 0 in a row. You only need the index of the first and last cell with value 0 to define a segment. As you may guess a segment is also a flood fill in progress. So we'll add an identification number to the segments to distinguish between the different flood fills.

The nice thing of this algorithm is that you only need to keep track of segments and their identification number in row i and i-1. So that when you process row i, you have the list of segments found in the row i-1 and their associated identification number.

You then have to process segment connection in row i and row i-1. I'll explain below how this can be made efficient.

For now you have to consider three cases:

  1. found a segment in row i not connected to a segment in row i-1. Assign it a new hole identification (incremented integer). If it's connected to the border of the table, make this number negative.

  2. found a segment in row i-1 not connected to a segment in row i-1. You found the lowest segment of a hole. If it has a negative identification number it is connected to the border and you can ignore it. Otherwise, congratulation, you found a hole.

  3. found a segment in row i connected to one or more segments in row i-1. Set the identification number of all these connected segments to the smallest identification number. See the following possible use case.

row i-1:   2  333 444 111
row i  :  ****  *** ***

The segments in row i should all get the value 1 identifying the same flood fill.

Matching segments in rows i and row i-1 can be done efficiently by keeping them in order from left to right and comparing segments indexes.

Process segments by lowest start index first. Then check if it's connected to the segment with lowest start index of the other row. If no, process case 1 or 2. Otherwise continue identifying connected segments, keeping track of the smallest identification number. When no more connected segments is found, set the identification number of all connected segments found in row i to the smallest identification value.

Index comparison for connectivity test can by optimized by storing (first-1,last) as segment definition since segments may be connected by their corners. You then can directly compare indexes bare value and detect overlapping segments.

The rule to pick the smallest identification number ensures that you automatically get the negative number for connected segments and at least one connected to the border. It propagates to other segments and flood fills.

This is a nice exercise to program. You didn't specify the exact output you need. So this is also left as exercise.


With floodfill, which you already have: run along the BORDER of your matrix and floodfill it, i.e., change all zeroes (black) to 2 (filled black) and ones to 3 (filled white); ignore 2 and 3's that come from an earlier floodfill.

For example with your matrix, you start from the upper left, and floodfill black a zone with area 11. Then you move right, and find a black cell that you just filled. Move right again and find a white area, very large (actually all the white in your matrix). Floodfill it. Then you move right again, another fresh black area that runs along the whole upper and right borders. Moving around, you now find two white cells that you filled earlier and skip them. And finally you find the black area along the bottom border.

Counting the number of colours you found and set might already supply the information on whethere there are holes in the matrix.

Otherwise, or to find where they are, scan the matrix: all areas you find that are still of color 0 are holes in the black. You might also have holes in the white.

Another method, sort of "arrested flood fill"

Run all around the border of the first matrix. Where you find "0", you set to "2". Where you find "1", you set to "3".

Now run around the new inner border (those cells that touch the border you have just scanned). Zero cells touching 2's become 2, 1 cells touching 3 become 3.

You will have to scan twice, once clockwise, once counterclockwise, checking the cells "outwards" and "before" the current cell. That is because you might find something like this:

22222222222333333
2AB11111111C
31

Cell A is actually 1. You examine its neighbours and you find 1 (but it's useless to check that since you haven't processed it yet, so you can't know if it's a 1 or should be a 3 - which is the case, by the way), 2 and 2. A 2 can't change a 1, so cell A remains 1. The same goes with cell B which is again a 1, and so on. When you arrive at cell C, you discover that it is a 1, and has a 3 neighbour, so it toggles to 3... but all the cells from A to C should now toggle.

The simplest, albeit not most efficient, way to deal with this is to scan the cells clockwise, which gives you the wrong answer (C and D are 1's, by the way)

22222222222333333
211111111DC333333
33

and then scan them again counterclockwise. Now when you arrive to cell C, it has a 3-neighbour and toggles to 3. Next you inspect cell D, whose previous-neighbour is C, which is now 3, so D toggles to 3 again. In the end you get the correct answer

22222222222333333
23333333333333333
33

and for each cell you examined two neighbours going clockwise, one going counterclockwise. Moreover, one of the neighbours is actually the cell you checked just before, so you can keep it in a ready variable and save one matrix access.

If you find that you scanned a whole border without even once toggling a single cell, you can halt the procedure. Checking this will cost you 2(W*H) operations, so it is only really worthwhile if there are lots of holes.

In at most W*H*2 steps, you should be done.

You might also want to check the Percolation Algorithm and try to adapt that one.