The JSON structure you provided is for the response from Nanonets OCR API for an invoice processing task. Here's an overview of how the JSON structure is organized and how you can identify tabular line-item data versus regular data points:
"message": "Success", // Overall request status
"result": [ // Resulat array object carrying extracted data
{
"message": "Success", // File-specific request status
"input": "Invoice Sample 2.png", // Name of the input-file
"prediction": [ // Prediction array object carrying extracted fields from the input file, contains a block of code specific to each field
{
"id": "15d0f580-347b-45f9-9215-c9bfa3ef838f",
"label": "seller_name", // Name of the data point extracted
"xmin": 64, // Image co-ordinates of the data point extracted
"ymin": 130,
"xmax": 159,
"ymax": 141,
"score": 0.9952863, // Confidence score indicating accuracy of prediction
"ocr_text": "East Repair Inc.", // extracted text corresponding to "label" above
"type": "field", // "type" indicates if the field is key-value pair or a table field
"status": "correctly_predicted", // "status" indiactes if the field was correctly extracted
"page_no": 0, // extracts the oage number the extracted field appears on
"label_id": "55dae24f-493f-490b-aff4-a34c1319c0ab",
"lookup_edited": false,
"lookup_parent_id": ""
},
... // the above code block repeats for every field extracted
...
-
Top-Level Structure:
-
"message"
: A status message indicating the overall success or failure of the request. -
"result"
: An array containing the results of the OCR prediction for each page of the document.
-
-
Result Object:
-
"message"
: Status message for the specific result (typically "Success"). -
"input"
: Name or identifier of the input file. -
"prediction"
: Array object containing predictions for each identified label or table. -
"page"
: Page number of the document (often 0 for the first page). -
"request_file_id"
: Unique identifier for the uploaded file. -
"filepath"
: Path to the uploaded file. -
"size"
: Object containing width and height of the processed image.
-
-
Prediction Object: Each prediction object contains details about a specific field or table identified in the document.
-
"id"
: Unique identifier for the prediction. -
"label"
: Semantic label describing the content (e.g., seller_name, invoice_number). -
"xmin"
,"ymin"
,"xmax"
,"ymax"
: Bounding box coordinates of the predicted field or table. -
"ocr_text"
: Text content extracted by OCR for the identified label or cell. -
"type"
: Type of prediction ("field" for regular data points, "table" for tabular data). -
"score"
: Confidence Score to indicate the accuracy of the extracted data. -
"cells"
(for tables only): Array Object containing detailed predictions for each cell in the table.
-
-
Table Prediction:
- Identified by
"type": "table"
. - Contains
"cells"
array, where each cell object describes a specific cell within the table. - Each cell object includes
"text"
(extracted text content),"row"
,"col"
(row and column indices), and"label"
(semantic label if applicable).
- Identified by
-
Additional Information:
-
"status"
: Status message for the specific result (typically "Correctly predicted"). -
"page no"
: Contains the page number where the data point appears in the document. -
"label id"
: Unique identifier for the label (e.g., seller_name, invoice_number). -
"lookup_edited"
"lookup_parent_id"
: Additional metadata related to data lookup and editing (parts of the workflow automation component of the app, in case the customer wants to access workflow automation via API)
-
Identifying Tabular Line-Item Data:
To identify tabular line-item data specifically, you would look for predictions where "type": "table"
. Within each table prediction:
- Use the
"cells"
array to access each cell's"text"
and"label"
. - Typically, cells are arranged in rows and columns (
"row"
,"col"
indices provide this structure). - Labels like
"Quantity"
,"Description"
,"Price"
, and"Line_Amount"
help identify line-item details.
Identifying Regular Data Points:
Regular data points are identified by "type": "field"
in the prediction object. These include labels such as "seller_name"
, "invoice_number"
, "buyer_name"
, etc. Each "ocr_text"
value corresponds to the extracted content for that label.
Summary:
The JSON structure provides a detailed breakdown of OCR predictions, distinguishing between regular data points and structured tabular data. Use "type"
to differentiate between the two, and leverage "label"
and "ocr_text"
to understand the semantic meaning and extracted content, respectively.
Example:
For instance, the following code snippet is a sample JSON response processed using the Nanonets Invoice OCR model accessed through POST OCR Predict using Image file API segment: (the sample file is also attached for your reference)
- “message”: Indicates overall status of the request.
- “result”: Array object detailing the results.
- “message”: Status message for the specific result (typically "Success").
- “input”: Name or identifier of the input file.
- “prediction”: Array object containing predictions for each identified label or table.
- Each segment relates to one specific data point extracted:
- “label”: Semantic label describing the content (in this case, seller_name).
- “ocr_text”: Text content extracted by OCR for the identified label or cell (in this case, “East Repair Inc.”
- “type”: Can be used to identify if this is a regular data field or a tabular field.
- “xmin”, “xmax”, “ymin”, “ymax”: Used to indicate the co-ordinates of the data point on the page.
- The same process is repeated for “seller_address” which is "1912 Harvest Lane\nNew York , NY 12210" and so on.
"message": "Success",
"result": [
{
"message": "Success",
"input": "Invoice Sample 2.png",
"prediction": [
{
"id": "15d0f580-347b-45f9-9215-c9bfa3ef838f",
"label": "seller_name",
"xmin": 64,
"ymin": 130,
"xmax": 159,
"ymax": 141,
"score": 0.9952863,
"ocr_text": "East Repair Inc.",
"type": "field",
"status": "correctly_predicted",
"page_no": 0,
"label_id": "55dae24f-493f-490b-aff4-a34c1319c0ab",
"lookup_edited": false,
"lookup_parent_id": ""
},
{
"id": "bd217a6c-a986-469b-9bf6-a161239b5644",
"label": "seller_address",
"xmin": 64,
"ymin": 151,
"xmax": 182,
"ymax": 179,
"score": 0.99070704,
"ocr_text": "1912 Harvest Lane\\nNew York , NY 12210",
"type": "field",
"status": "correctly_predicted",
"page_no": 0,
"label_id": "3b789d04-562e-4cc3-940f-b23e9e46f178",
"lookup_edited": false,
"lookup_parent_id": ""
},