OCR for Seven-segment display numbers in objective C.

The goal: Implement OCR for https://en.wikipedia.org/wiki/Seven-segment_display numbers in objective C.

The first step was to choose an OCR library. Tesseract https://github.com/gali8/Tesseract-OCR-iOS was an easy choice, being open source and with a lot of examples and resources available. For the training data I spent a lot of time playing with http://vietocr.sourceforge.net/training.html and https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 but at the end none of my training data worked better than this one https://github.com/arturaugusto/display_ocr. Anyways I recommend to try at least once the training process, as understand how Tesseract source training data works helps a lot with the image processing that will be required later.

The next step was to provide Tesseract with the source image. In my case it was a picture from the camera, so a bit of preprocessing was necessary, to allow the user to crop the text area. I used https://github.com/jberlana/JBCroppableView with the next modifications:

  • Force only 4 points
  • Recalculate the 4 points location on panning so they always keep a rectangle shape. There are two reasons for this, the more important one is that at the end, we are going to feed Tesseract with an UIImage, which is a rectangular context, and if the content is not rectangular as well we would need to colorise the not “image filled” background corners. Tesseract likes homogeneous backgrounds! The second reason is that for the user is much more easier to select the area as he needs only to drag two opposite corners of the selector to form the rectangle.

To make this the easiest way is add a tag to the points of the selector in the addPoints method of the JBCroppableLayer

so we can on the pan gesture in JBCroppableImageView and compare with the _pointsView.activePoint. Once we know which point we are moving is pretty straightforward to recalculate the position of the other three, alway carefully checking to not pass the image boundaries or user will not be able to pan back the points to the image. Example for the first point (top left).

I’m sure there is a more elegant compact way to do this, but in this case the main priority was a clear and easy to read code.

Once we have our cropped image it starts the more tricky part. Feed it as is to tesseract usually won’t work, and we will need to process the image to remove noise and highlight the text from the background. In my case the combination that worked better at the end was:

  • Convert to grayscale
  • Increase contrast in a factor of 7, so we end with an almost pure white and black image https://github.com/coryleach/UIImageAdjust
  • Expand image increasing the blank space between the borders and the text by at least 25%. This ended to be critical and was the ‘magic touch’ that made the OCR to work almost every time.
  • Erode the image to remove space between segments. Good hints of how to implement this in https://github.com/shu223/vImageCategory

If someone is interested in more details of this part, feel free to contact.

There were also a few Tesseract values that I played a bit with until I found the combination that worked better.

2 Comments

Add yours →

  1. I tried to implement Tesseract to recognize Seven-segment display.
    Can you please share your project?
    Thank you

  2. I am trying OCR for an app on ipad and using tesseract gali and have performed all the steps you have listed but right now skew adjustment is the issue due to which accuracy drops. And it becomes more pronounced when it comes to letters like N and Z, W and M, L and I, L and V , C and G etc. Please guide me on this i will be grateful since I have a very limited timeline for the app I am creating and I am stuck at this.
    best regards
    Zain

Leave a Reply