What is AI and ML?
Artificial Intelligence (AI) broadly means getting a computer to mimic human intelligence in some way. We teach computers by showing them many examples of questions and answers. A set of examples is referred to as a 'dataset'.
Machine learning (ML) is a branch of AI, and it consists of the techniques that enable computers to figure things out from the dataset.
For a more detailed explanation watch this insightful video by KI Campus: Artificial Intelligence explained in 2 minutes
What is Deep Learning?
Deep learning is a subset of machine learning. It performs a task repeatedly, on large amounts of data and each time tweaks itself (and creates another layer of logic and information) to improve the outcome.
Let's look at some common examples to start with:
If I show you images of cats, you recognise them as cats, even if you’ve never seen that image before. And it wouldn't matter if the cat in the image is dressed up as a witch or wearing sunglasses. You can still recognise it to be a cat because your brain knows the various elements that define a cat: the shape of its muzzle, number and placement of legs, and so on.
Deep learning can do this. A use-case for this would be self-driving cars—for a car to determine its next action, it needs to know what’s around it. It must be able to recognise people, other vehicles, road signs, animals etc in any form. Standard machine learning algorithms would not be capable of this.
Nanonets OCR models use Deep Learning to understand multiple variants of documents and intelligently extract and label key pieces of information from them.
How do Nanonets OCR models work?
Let's use this image as an example. Each of these is some kind of identity card. Each region and each use case has a unique layout but mostly consistent information (Name, ID number, birthdate etc). Say you want to extract only the ID numbers from all these different cards, here's how we'd do that with Deep Learning:
1. Collect examples
Like you would teach a child to recognise a cat, we first need to show the computer some examples of what ID numbers are on these images you have. This would be your training dataset.
2. Learning from examples
The Nanonets Deep Learning algorithm then uses all these examples to teach itself to recognise any ID number. It learns to look for an 'ID number' on any document, no matter what layout or orientation it may be in. We call this process a Training session.
3. Test and Refine
After a training session is complete, the computer is ready to be shown images it has never seen before (containing an ID number) and tell us which data on those new images is the ID number.
The computer tries to identify the ID number based on the logic created from the examples it saw earlier. It does this without human intervention. We refer to this as Prediction.
Initially, the computer makes mistakes in identifying the ID number correctly. When we correct these mistakes, we're giving it more examples to learn from. The computer then uses this new information to create new layers of logic and self corrects in another training session. With each round it starts making fewer mistakes until it is as accurate as it can be. This is what we refer to as Retraining.
Why is Deep Learning better than regular OCR?
Say you want to process 1000s of those ID cards we saw above:
Regular OCR would extract every single word on the cards and you would be left with the herculean task of assigning those words to the labels you need: Janet Smith > name, 12Y1019> ID number and so on.
If you used a Template, you'd have to create a template for each new ID format. Doing this for every new ID would not be any less cumbersome.
Deep Learning, however, would be able to make sense of those words and recognise them no matter what ID format. It would then correctly assign labels to the words without you having to do so.
Here are some key advantages:
- It is incredibly Faster :
With a trained model, you can go from a pile of documents to an Excel sheet in minutes. Simply upload files and Nanonets assigns labels to the words it finds, making all files instantly download-ready. - No template required:
Each document you receive will most probably differ in format, orientation and size. It is practically impossible to use a single template that accounts for all these variations. Since deep learning models are trained to understand and not just read information, they're able to correctly tell you which words on your document correspond to which labels, irrespective of format. - It is continuously improving:
When a human makes a mistake and you give them the correct answer, they keep in mind that new information. Similarly, a deep learning model uses manual corrections to improve itself. That way the next time it encounters a similar instance of that data, you won't have to correct it again.