Get the location of all text present in image using opencv

Here's a potential approach using morphological operations to filter out non-text contours. The idea is:

  1. Obtain binary image. Load image, grayscale, then Otsu's threshold

  2. Remove horizontal and vertical lines. Create horizontal and vertical kernels using cv2.getStructuringElement then remove lines with cv2.drawContours

  3. Remove diagonal lines, circle objects, and curved contours. Filter using contour area cv2.contourArea and contour approximation cv2.approxPolyDP to isolate non-text contours

  4. Extract text ROIs and OCR. Find contours and filter for ROIs then OCR using Pytesseract.


Removed horizontal lines highlighted in green

enter image description here

Removed vertical lines

enter image description here

Removed assorted non-text contours (diagonal lines, circular objects, and curves)

enter image description here

Detected text regions

enter image description here

import cv2
import numpy as np
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
clean = thresh.copy()

# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

# Remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,30))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(clean, [c], -1, 0, 3)

cnts = cv2.findContours(clean, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    # Remove diagonal lines
    area = cv2.contourArea(c)
    if area < 100:
        cv2.drawContours(clean, [c], -1, 0, 3)
    # Remove circle objects
    elif area > 1000:
        cv2.drawContours(clean, [c], -1, 0, -1)
    # Remove curve stuff
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.02 * peri, True)
    x,y,w,h = cv2.boundingRect(c)
    if len(approx) == 4:
        cv2.rectangle(clean, (x, y), (x + w, y + h), 0, -1)

open_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
opening = cv2.morphologyEx(clean, cv2.MORPH_OPEN, open_kernel, iterations=2)
close_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,2))
close = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, close_kernel, iterations=4)
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    area = cv2.contourArea(c)
    if area > 500:
        ROI = image[y:y+h, x:x+w]
        ROI = cv2.GaussianBlur(ROI, (3,3), 0)
        data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
        if data.isalnum():
            cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
            print(data)

cv2.imwrite('image.png', image)
cv2.imwrite('clean.png', clean)
cv2.imwrite('close.png', close)
cv2.imwrite('opening.png', opening)
cv2.waitKey()

Alright, here's another possible solution. I know you work with Python - I work with C++. I'll give you some ideas and hopefully, if you desire so, you will be able to implement this answer.

The main idea is to not use pre-processing at all (at least not at the initial stage) and instead focus on each target character, get some properties, and filter every blob according to these properties.

I'm trying to not use pre-processing because: 1) Filters and morphological stages could degrade the quality of the blobs and 2) your target blobs appear to exhibit some characteristics that we could exploit, mainly: aspect ratio and area.

Check it out, the numbers and letters all appear to be taller than wider… furthermore, they appear to vary within a certain area value. For example, you want to discard objects "too wide" or "too big".

The idea is that I'll filter everything that does not fall within pre-calculated values. I examined the characters (numbers and letters) and came with minimum, maximum area values and a minimum aspect ratio (here, the ratio between height and width).

Let's work on the algorithm. Start by reading the image and resizing it to half the dimensions. Your image is way too big. Convert to grayscale and get a binary image via otsu, here's in pseudo-code:

//Read input:
inputImage = imread( "diagram.png" );

//Resize Image;
resizeScale = 0.5;

inputResized = imresize( inputImage, resizeScale );

//Convert to grayscale;
inputGray = rgb2gray( inputResized );

//Get binary image via otsu:
binaryImage = imbinarize( inputGray, "Otsu" );

Cool. We will work with this image. You need to examine every white blob, and apply a "properties filter". I’m using connected components with stats to loop trough each blob and get its area and aspect ratio, in C++ this is done as follows:

//Prepare the output matrices:
cv::Mat outputLabels, stats, centroids;
int connectivity = 8;

//Run the binary image through connected components:
int numberofComponents = cv::connectedComponentsWithStats( binaryImage, outputLabels, stats, centroids, connectivity );

//Prepare a vector of colors – color the filtered blobs in black
std::vector<cv::Vec3b> colors(numberofComponents+1);
colors[0] = cv::Vec3b( 0, 0, 0 ); // Element 0 is the background, which remains black.

//loop through the detected blobs:
for( int i = 1; i <= numberofComponents; i++ ) {

    //get area:
    auto blobArea = stats.at<int>(i, cv::CC_STAT_AREA);

    //get height, width and compute aspect ratio:
    auto blobWidth = stats.at<int>(i, cv::CC_STAT_WIDTH);
    auto blobHeight = stats.at<int>(i, cv::CC_STAT_HEIGHT);
    float blobAspectRatio = (float)blobHeight/(float)blobWidth;

    //Filter your blobs…

};

Now, we will apply the properties filter. This is just a comparison with the pre-calculated thresholds. I used the following values:

Minimum Area: 40  Maximum Area:400
MinimumAspectRatio:  1

Inside your for loop, compare the current blob properties with these values. If the tests are positive, you "paint" the blob black. Continuing inside the for loop:

    //Filter your blobs…

    //Test the current properties against the thresholds:
    bool areaTest =  (blobArea > maxArea)||(blobArea < minArea);
    bool aspectRatioTest = !(blobAspectRatio > minAspectRatio); //notice we are looking for TALL elements!

    //Paint the blob black:
    if( areaTest || aspectRatioTest ){
        //filtered blobs are colored in black:
        colors[i] = cv::Vec3b( 0, 0, 0 );
    }else{
        //unfiltered blobs are colored in white:
        colors[i] = cv::Vec3b( 255, 255, 255 );
    }

After the loop, construct the filtered image:

cv::Mat filteredMat = cv::Mat::zeros( binaryImage.size(), CV_8UC3 );
for( int y = 0; y < filteredMat.rows; y++ ){
    for( int x = 0; x < filteredMat.cols; x++ )
    {
        int label = outputLabels.at<int>(y, x);
        filteredMat.at<cv::Vec3b>(y, x) = colors[label];
    }
}

And… that's pretty much it. You filtered all the elements that are not similar to what you are looking for. Running the algorithm you get this result:

enter image description here

I've additionally found the Bounding Boxes of the blobs to better visualize the results:

enter image description here

As you see, some elements are miss-detected. You can refine the "properties filter" to better identify the characters you are looking for. A deeper solution, involving a little bit of machine learning, requires the construction of an "ideal feature vector", extracting features from the blobs, and comparing both vectors via a similarity measure. You can also apply some post-processing to improve the results...

Whatever, man, your problem is not trivial nor easy scalable, and I'm just giving you ideas. Hopefully, you will be able to implement your solution.


One method is to use sliding window (It is expensive).

Determine the size of the characters in the image (all characters are of same size as seen in the image) and set the size of the window. Try tesseract for the detection (The input image requires pre processing). If a window detects characters consecutively, then store the coordinates of the window. Merge the coordinates and get the region on the characters.