Around this problem, Xiaomi has developed a set of table recognition algorithms, which can efficiently and accurately extract tables in pictures and convert them into editable Excel files.At present, the algorithm has been successfully implemented in flagship models such as Xiaomi Mi 10S series and MIX Fold 2.You can identify it from the album – more – form, or scan it to enter the experience.
table detection algorithm
Xiaomi said that the table detection algorithm mainly extracts the table area accurately from the picture, and corrects the table to obtain a flat table picture for the next step of table recognition;
The table recognition algorithm mainly extracts the table structure and table text content from the picture, and then combines these information effectively to output an editable Excel table.
Form detection has the following difficulties: on the one hand,cell phoneOn the other hand, the requirements for table detection results are very high, and other texts are often included around the table. If the detection results are inaccurate, it will have a negative impact on the subsequent recognition results.
Xiaomi’s table detection algorithm will detect the table area and the four corners of the table at the same time, and obtain a flat table with only the table area through perspective transformation and our self-developed anti-distortion algorithm.The effect is shown in the figure.
Since the algorithm runs on the mobile phone, it is necessary to ensure the running speed and model size. Xiaomi adopts a very lightweight one-stage detection framework, and the backbone adopts shuffleNetV2;
When the table frame is detected, the key point information is returned to facilitate the perspective correction of the table, and Wing loss is used instead of L1 loss to make the key point regression more accurate;
In terms of data, the algorithm is used to mine a large amount of table detection data from public data at low cost, which significantly improves the table detection effect. The final model size is about 1M and runs smoothly on Xiaomi mobile phones.
Form Recognition Algorithm
The table recognition algorithm runs on the server, and the main modules include: text detection, text recognition, table structure prediction, cell matching, alignment algorithm, and Excel export.
The current mainstream method is to represent the table with HTML hypertext, and then encode the HTML to predict the HTML sequence and the corresponding coordinate information.
This method has achieved good results on open source datasets, and China Ping An Technology and Baidu have also adopted this scheme, but too many HTML tags make table structure identification prone to errors.
In view of the shortcomings of this method, we adopt a new coding method for tables, which can represent tables of any structure with only four tags, which greatly improves the accuracy of table structure recognition.
In the deployment process of table recognition, the Fastertransformer inference framework is used to accelerate,Officials say Xiaomi’s reasoning speed has been increased by about 20 times, significantly improving the user experience.
Summarize
The algorithm can efficiently and conveniently extract tables from pictures, which greatly improves office efficiency. Xiaomi said that engineers will continue to improve the recognition experience of document images in Xiaomi phones.