Remove non straight lines from text image

Typical methods to remove lines are to use horizontal/vertical kernels or cv2.HoughLinesP() but these methods only work if the lines are straight. In this case, the lines are not straight so an idea is to use a diagonal kernel, morphological transformations, and contour filtering to remove the lines from the text. I will be using a previous answer's approach found in removing horizontal lines in an image but with a diagonal kernel


We begin by converting the image to grayscale and perform Otsu's threshold to obtain a binary image. Next we create a diagonal kernel then perform morph close to detect/filter out the diagonal lines. Since cv2.getStructuringElement() does not have any built in diagonal kernel, we create our own

# Read in image, grayscale, and Otsu's threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Create diagonal kernel
kernel = np.array([[0, 0, 1],
                   [0, 1, 0],
                   [1, 0, 0]], dtype=np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)

The image isolated the main diagonal lines but it also included small lines from the text. To remove them we find contours and filter using contour area. If the contour passes our filter, we effectively remove the noise by "filling in" the contour with cv2.drawContours(). This leaves us with our desired diagonal lines to remove

# Find contours and filter using contour area to remove noise
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 500:
        cv2.drawContours(opening, [c], -1, (0,0,0), -1)

From here we simply cv2.bitwise_xor() with the original image to get our result

# Bitwise-xor with original image
opening = cv2.merge([opening, opening, opening])
result = cv2.bitwise_xor(image, opening)

Notes: It is difficult to remove the lines without affecting the text although it is possible and will need some clever tricks to "repair" the text. Take a look at remove borders from image but keep text written on borders for a method to reconstruct the missing text. Another method to isolate the diagonal lines would be to take a contrarian approach; instead of trying to detect diagnoal lines, why not try to determine what is not a diagnoal line. You could probably do this by simple filtering techniques. To create dynamic diagonal kernels, you could use np.diag() for different diagonal line widths

Full code for completeness

import cv2
import numpy as np

# Read in image, grayscale, and Otsu's threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Create diagonal kernel
kernel = np.array([[0, 0, 1],
                   [0, 1, 0],
                   [1, 0, 0]], dtype=np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)

# Find contours and filter using contour area to remove noise
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 500:
        cv2.drawContours(opening, [c], -1, (0,0,0), -1)

# Bitwise-xor with original image
opening = cv2.merge([opening, opening, opening])
result = cv2.bitwise_xor(image, opening)

cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()