The application under development by vteam #67, is an image scanning and indexing desktop application. It was mostly developed by our team since it was in its early phases of evolution when our team took over. It is mainly developed using C#, .NET Framework and SQL Server 2008. It is multilingual and has different modules which allows specific user group to focus on a specific area, like scanning, Quality assurance, Index/auto indexing, exporting. Third Party SDKs used for this feature are:
Initial Requirement [Grouping images on the bases of automatically recognized specific information from the scanned images]:
Client had shared following initial main requirements to add to his existing project which were as follows:
Create a template where user should be able to select a region of image having standard barcode.
During scanning, read each image from specific location which user mentioned in the template, read the barcodes value and group the images, if value matches the template value.
Zonal OCR [Initial Template to Read Barcodes]:
An interactive form is designed where user can import/scan images and draw different zones. As soon as a zone is placed on a barcode, it immediately reads the barcode and shows the result at status bar. The key component utilized for this implementation was a third party barcode reader Atalasoft. The requirement was fulfilled but after realizing the ease of using templates, Client asked for advance level templates which should allow reading any area of image instead of only standard barcodes.
Problems & Challenges in reading any area of image:
OCR Engine by IRIS was utilized for reading any area of image but OCR engine’s accuracy varies and depends on lot of factors hence it was very difficult to compare the text read from OCR with user input. Following were major problems:
- OCR engine often inference the characters wrong so problem is with accuracy.
- Sometimes the required information location changes from image to image so zonal OCR does not work.
Accuracy Improvement & Smart Zoning [Advance Level Templates for reading ANY AREA of Image and improving accurac]:
In order to overcome above problems, following techniques and algorithms were applied and a working solution was finally provided by vteams # 67 to Client:
- For improving accuracy, following two algorithms were implemented by vteams # 67 which improved the accuracy to more than 90%.
- An option was introduced where user may set the confidence level. It will consider a match when specific number of recognized text are matched with user input but if there are more than one match then a match with the highest level of confidence will be considered.
- Noise removal algorithm was introduced to clean the noise/unwanted characters to enhance the matching accuracy.
- Image cleanup techniques were used to improve the OCR.
- Smart Zoning was implemented to resolve location changes problem where user needs to create a label and specify the possible locations where its value may exist e.g.; x number of lines up or down, x characters left or right to the label.
- ICR, MICR of IRIS and OMR engines of Atalasoft were incorporated to read specific fonts and text types.
As per Client’s feedback, vteams # 67’s Zonal OCR/Smart Zoning feature is considered better than EMC Captiva Capture, which is very popular and bit costly product from a market leader.