Did you ever wonder what the cause of a test failure was? And did this ever involve special characters? E.g., Something banal that we use mostly in our Country, as the language selection. Well, special characters are crushing OCR algorithms.
Growing up in Switzerland, we are used to being able to apply different languages to the website we are visiting. Changing from English to German, or any other language. But then again it is not as trivial as you might think for the computer. Especially for the test automation process. As we have seen in our first blog on Interactive Visual Testing finding the special character ç is not as trivial as one might think. Incorrectly rendered entry in the language selection dropdown was the main issue.
But also, during the application of optical character recognition (OCR). a special character can result in a test failure. Just because depending on the specific OCR algorithm there will be no recognition of the ç in “Français”, resulting in reads as “Francais”, “Frangais” or even “Fran5ais. Which conclude, in a test failure as the correct selection cannot be found. But don’t be fooled by the language specific characters, similar confusion frequently also occurs in OCR for characters as ‘I’, ‘l’, ‘1’ or ‘O’ and ‘0’.
Are you ok with 99% accuracy ?
So what know, shall we just accept this fact and carry on as before? Because a good OCR algorithm claims to be about 99% accurate, so isn’t this good enough? No, for two main reasons:
First this accuracy is usually determined by scanning a whole page of printed text with good image quality. Since the image you get from your UI has usually much less image information (96 dpi vs 300 dpi for a good scan), the accuracy you will get is probably lower.
But the second reason is even more important. If you have 1000 characters on your UI, 99 percent accuracy means there are still 10 erroneous characters. What if even one of those characters is in the text you are looking for? It will lead to a test failure! We can play the game of percentages here: 99% accuracy, 10 characters per search text, 2 OCR searches per test case, 20 test cases: This will result in an overall probability of getting all of them right of 0.99(10*2*20) = 1.8%. This is clearly not acceptable.
0.99 (10x2x20)= 1.8% no test failures
So how can we reliably find text on the screen with these imperfect algorithms?
The secret of Reverse OCR
The idea behind Reverse OCR is that we can take advantage of the fact that we know already what should be written on the screen. In the example in the introduction, we are applying OCR to find a specific value on the UI.
So instead of starting with the screen and checking the output of the OCR algorithm, we reverse the process. We can start with the known output and use the OCR algorithm to find where this text appears on the screen. With this method we can achieve much more reliable and robust results when searching a text on the screen.
What happens technically?
In practice Reverse OCR can be implemented by comparing the search string with every detected string on the screen. We then assign a difference metric to every detected string, based on how different it is from the string we were looking for.
This difference metric is directly influenced by the probability of the different characters detected by the OCR algorithm.
Let us look at a simplified example: We are looking for the string “IoC”. For the given text (100 IoC lock ABC) that is shown on the screen, let us assume the OCR algorithm will detect the following characters with the respective probabilities:
As we can see in this example, a pure OCR approach which uses the best guess for every character would return the text “loc” where we expect “IoC” and therefore we could not find what we were looking for. With Reverse OCR we are looking at the difference per word and will use the text with the smallest difference. In this case we did find our search string “IoC” at the correct location with a difference of 0.45. Of course, we can define up to which difference we are considering something a match. It could be that the text we are looking for is not on the screen and the best match would then lie above this tolerance value.
PRO INSIGHT: Text with smallest difference: Of course, we can define up to which difference we are considering something a match. It could be that the text we are looking for is not on the screen and the best match would then lie above this tolerance value.
Using a dictionary is to 2000, or is it?
Classical OCR algorithms often apply a dictionary to correct for characters that were read incorrectly. But in software testing we often deal with words that do not appear in a dictionary, either because they are technical terms or because we are searching for generated IDs, names and similar words that are not part of a dictionary.
MISSING ENTTRIES IN THE DICTIONARY: technical terms and generated IDs
With Reverse OCR we do not need a dictionary. While classical OCR compares every detected word with the dictionary and returns the most likely match, in Reverse OCR we are comparing against the search text directly.
Ready for “newly generated text” by application under test!
But there can also be situations where we do not know the search text in advance. In test automation there are three different situations where we are using OCR.
- Finding a text on the screen so we can interact with it (as described in the language dropdown example).
- Verifying a given text appears on the screen (or does not appear); for instance, to verify that a newly created customer appears in a list.
- Reading a text from the screen that was generated by the application under test, e.g., an identification number for the created customer that will be used later in the test case.
In the first two situations we can apply Reverse OCR as described above. However, in the third situation we do not know the search text in advance. But we can still improve the output of the OCR algorithm. In most cases we still know what kind of text we are expecting to appear.
KNOW YOUR FORMATS: Formats as e.g., ID-abc-00xyz -> prioritize characters ->enhanced success rate!
Most likely, we know that the generated identifier is of the form ID-abc-00xyz, where a, b, and c are letters and x, y and z are digits. This knowledge allows us to prioritize characters that match the expected pattern even though the OCR algorithm detected them with a lower probability. This in turn will lead to a much better success rate.
Swop OCR for Reverse OCR
With Reverse OCR you can be confident that your test case will be able to successfully select “Français” in the language dropdown and change the display language of your application. Continuing for your reader a pleasant read in French on the website, the next verification step in your test case and for you being able to use the regained time for other matters at hand within the software test automation. All in one platform TestResults.io.
NEXT UP: For yet another aspect that makes interactive visual testing such a Winner the no wait element. Your test case needs no loner to wait until the transition of re-rendering is completed but instead carries on smoothly. How is that possible without any hard-coded waits in your test code? Read on here.