This document will help you retrieve results of a PDF file using the Nanonets API
Before we start, lets have an example in mind to make things clearer
1. Make sure you have an OCR model on our platform
2. The PDF contains 3 pages
Step 1: Upload PDF with Prediction API
Eg: https://app.nanonets.com/api/v2/OCR/Model/{MODEL_ID}/LabelFile/
Documentation ->https://nanonets.com/documentation/#operation/OCRModelLabelUrlsByModelIdPost
For every page in the PDF you'll get a request_file_id, this will be the same for all the pages. If you want to retrieve data at a future date, you should save this request_file_id
Step 2: Retrieve Page IDs using request_file_id
Every page in the PDF will have a unique page id at nanonets side. Once you have the page ids, you can retrieve the data against the particular page.
You can retrieve the page ID by making a GET request to the following endpoint
https://app.nanonets.com/api/v2/Inferences/Model/{YOUR_MODEL_ID}/InferenceRequestFiles/{request_file_id}
The response will be something like
"page_ids":
[ "page_id_1",
"page_id_2",
"page_id_3"
]
Step 3: Retrieve data for the model and then filter by Page IDs
Once you have the page IDs, you can retrieve data now
You can do this using the Get All Prediction Files API
Once you have this data at your side, you can filter through the moderated_images and unmoderated_images array and for every page in this array you can compare the "id" key against page_id_1, page_id_2, and page_id_3
After this, if the image has been moderated, the moderated_boxes array will have the info. The status of moderated_box will be "moderated"
{ "moderated_images_count": 0, "unmoderated_images_count": 3, "moderated_images": [], "unmoderated_images": [ { "model_id": "YOUR_MODEL_ID", "request_file_id": "request_file_id", "day_since_epoch": x, "is_moderated": false, "hour_of_day": 16, "id": "page_id_1", "url": "", "predicted_boxes" : [], "moderated_boxes" : [ { "label": "label_1", "xmin": 1370, "ymin": 315, "xmax": 1533, "ymax": 340, "score": 0.8414416, "ocr_text": "MUTUAL", "status": "correctly_predicted" }, { "label": "label_2", "xmin": 1152, "ymin": 450, "xmax": 1612, "ymax": 476, "score": 0.5940531, "ocr_text": "ABCD", "status": "moderated" } ] "size": { "width": 2479, "height": 3508 }, "page": 0, "original_file_name": "filename.pdf", "custom_response": null, "assigned_member": "", "is_deleted": false, "source": "api", "no_of_fields": x, "cost": 0, "payable_cost": 0, "status": "success", "retries": 0 } }