前段时间,工作中有个需求,需要提取图片表格中的内容,但是如何定位表格的坐标让我很是头疼,开始尝试了 matchTemplate 方法,但不是太理想,可能定位出多个重复目标,并且容易受到干扰,比如,如果有其他笔迹和表格线交叉,可能会影响匹配结果。后来在 opencv 官网上面看到了一个例子,效果非常好,在这里介绍给大家。
import cv2import numpy as npdef showImage(winname, img):cv2.imshow(winname, img)cv2.waitKey(0)cv2.destroyAllWindows()#删除表格的横线和竖线,但是保留表格内容def removelines(imageurl):image = cv2.imread(imageurl)result = image.copy()gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]# 移除水平线horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (40,1))remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] if len(cnts) == 2 else cnts[1]for c in cnts:#通过画白线来清除水平线cv2.drawContours(result, [c], -1, (255,255,255), 3)# 移除垂直线vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,20))remove_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)cnts = cv2.findContours(remove_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] if len(cnts) == 2 else cnts[1]for c in cnts:#通过画白线来清除垂直线cv2.drawContours(result, [c], -1, (255,255,255), 3)showImage('result', result)#提取表格def extracttable(imageurl):img = cv2.imread(imageurl)gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)if len(img.shape) != 2:gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)else:gray = imggray = cv2.bitwise_not(gray)bw = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, -2)# 创建图像副本horizontal = np.copy(bw)vertical = np.copy(bw)cols = horizontal.shape[1]horizontal_size = cols // 5# 定义结构元素horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontal_size, 1))# 图像形态学操作 腐蚀、膨胀horizontal = cv2.erode(horizontal, horizontalStructure)horizontal = cv2.dilate(horizontal, horizontalStructure)rows = vertical.shape[0]verticalsize = rows // 5# 定义结构元素verticalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (1, verticalsize))# 图像形态学操作 腐蚀、膨胀vertical = cv2.erode(vertical, verticalStructure)vertical = cv2.dilate(vertical, verticalStructure)mask = horizontal + verticalresult = cv2.bitwise_not(mask)showImage("", result)removelines('testimgnew/lines.png')extracttable('testimgnew/lines.png')
原始图片:

去除表格线的结果:

仅保留表格线:

除了使用适当的函数外,非常重要的一点是基于实际情况定义适合的结构元素。
在 removelines 方法中,循环 cnts 时,可以得到表格的具体坐标,进而就可以获取相应坐标内(也就是表格内各行列)图片内容,再基于具体业务进行后续处理。
文章转载自林员外聊编程,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




