Importing a grid of numbers from an image (sudoku like)

1 - Summary of a simple solution

In this particular DIGIT case there is a very simple solution based on neural nets (NNs)trained on MNIST Data. It is just a few lines of code:

i=Import["https://i.stack.imgur.com/LC2c2.png"];
imageGRID = ImagePartition[i, Scaled[1/22]];
lenet = NetModel["LeNet Trained on MNIST Data"];
test[x_] := If[ImageDistance[imageGRID[[2, 2]], x] > 10, lenet[x], "-"]
Grid[imageGRID /. x_Image :> test[x] /. 7 -> 1, Frame -> All]

enter image description here

2 - How it wroks

Now let's go in detail about it. In Wolfram NN repo there are 2 directly relevant NNs (as of today):

  • LeNet Trained on MNIST Data
  • CapsNet Trained on MNIST Data

I will go with the simplest - LeNet, let's get it from the repo:

lenet = NetModel["LeNet Trained on MNIST Data"];

Next get this image:

enter image description here

i=Import["https://i.stack.imgur.com/LC2c2.png"];

Now - partition it into an a matrix of sub-images -- one sub-image per digit. Your image got 22 boxes vertically and horizontally - so this is how you do it:

imageGRID = ImagePartition[i, Scaled[1/22]]

enter image description here

Now we can run LeNet on recognizing the digits, but we got a few little problems here.

  • LeNet is not trained on blank images - images without digits - it always expects a digit. So if you feed it blank it will make up a closest possible digit it thinks it corresponds to. So we need a way to test for blanks. THere are many ways - but let's just use a this test (where imageGRID[[2, 2]] is a sample blank image):

    test[x_] := If[ImageDistance[imageGRID[[2, 2]], x] > 10, lenet[x], "-"]
    
  • Another problem - LeNet can get confused with some of the typed digits. It will think 1 is a 7 actually due to the font chosen in your original image. This depends on specific images and fonts and can be customary hot-fixed. To avoid hacks I use here, you can train your own LeNet easily on the digits fo your type. Docs have a lot of examples about it.

So here is your final result:

Grid[imageGRID /. x_Image :> test[x] /. 7 -> 1, Frame -> All]

enter image description here

So simple with modern AI :-) And actually you can train a NN to take your original image grid and return a matrix of values. Maybe image2image nets' architecture would be interesting to try to adopt for this, as matrix is just another image; you can find those nets in Wolfram NN repo.


Here is a semi-manual way to do it :

Importation of the image, cutting it in a 48X48 array of small images, removing the borders :

imageArray = img  //
   RightComposition[
    ImagePartition[#, 40, 40] &
    , Map[Binarize, #, {2}] &
    , Map[ImageCrop[#, 38] &, #, {2}] &
    ];
(* a view of a piece of the array :  *)  
imageArray[[10 ;; 15, 5 ;; 10]] // Grid[#, Dividers -> All] &  

enter image description here

Then regrouping with FindCluster[#,5] (5 because we want 5 groups), removing exact duplicates (with Union) and see the result :

imageArray //
 RightComposition[
  Flatten
  , FindClusters[#, 5] &
  , (Union /@ # &)
  , Column[Row /@ #, Dividers -> All] &
  ]   

enter image description here

There's no errors, so one can manually create the correspondances between the groups of images and the numbers :

  rules = {1 -> "-", 2 -> 1, 3 -> 3, 4 -> 2, 5 -> 0}  

The final result :

 imageArray //
 RightComposition[
  ClusteringComponents[#, 5] &
  , # /. rules  &
  , Grid]  

enter image description here ]