Package com.azure.ai.formrecognizer.documentanalysis.administration


package com.azure.ai.formrecognizer.documentanalysis.administration

Azure Form Recognizer is a cloud-based service provided by Microsoft Azure that utilizes machine learning to extract information from various types of documents. Form Recognizer applies machine-learning-based optical character recognition (OCR) and document understanding technologies to classify documents, extract text, tables, structure, and key-value pairs from documents. You can also label and train custom models to automate data extraction from structured, semi-structured, and unstructured documents.

The service uses advanced optical character recognition (OCR) technology to extract text and key-value pairs from documents, enabling organizations to automate data entry tasks that would otherwise require manual effort. It can recognize and extract information like dates, addresses, invoice numbers, line items, and other relevant data points from documents.

The Azure Form Recognizer client library allows Java developers to interact with the Azure Form Recognizer service. It provides a set of classes and methods that abstract the underlying RESTful API of Azure Form Recognizer, making it easier to integrate the service into Java applications.

The Azure Form Recognizer client library provides the following capabilities:

  1. Document Analysis: It allows you to submit documents for analysis to detect and extract information like text, key-value pairs, tables, language, and fields. You can analyze both structured and unstructured documents.
  2. Model Management: It enables you to manage models created in your account by building, listing, deleting, and see the limit of custom models your account.
  3. Analysis Results: It provides methods to retrieve and interpret analysis results, including extracted text and field values, confidence scores, and document layout information.
  4. Polling and Callbacks: It includes mechanisms for polling the service to check the status of an analysis operation or registering callbacks to receive notifications when the analysis is complete.

Getting Started

The Azure Form Recognizer library provides analysis clients like DocumentAnalysisAsyncClient and DocumentAnalysisClient to connect to the Form Recognizer Azure Cognitive Service to analyze information from documents and extract it into structured data. It also provides administration clients like DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient to build and manage models from custom documents.

Note: This client only supports DocumentAnalysisServiceVersion.V2022_08_31 and newer. To use an older service version, FormTrainingClient and FormTrainingAsyncClient

Service clients are the point of interaction for developers to use Azure Form Recognizer. DocumentAnalysisClient is the synchronous service client and DocumentAnalysisAsyncClient is the asynchronous service client. The examples shown in this document use a credential object named DefaultAzureCredential for authentication, which is appropriate for most scenarios, including local development and production environments. Additionally, we recommend using managed identity for authentication in production environments. You can find more information on different ways of authenticating and their corresponding credential types in the Azure Identity documentation".

Sample: Construct a DocumentModelAdministrationClient with DefaultAzureCredential

The following code sample demonstrates the creation of a DocumentModelAdministrationClient, using the `DefaultAzureCredentialBuilder` to configure it.

 DocumentModelAdministrationClient client = new DocumentModelAdministrationClientBuilder()
     .endpoint("{endpoint}")
     .credential(new DefaultAzureCredentialBuilder().build())
     .buildClient();
 

Further, see the code sample below to use AzureKeyCredential for client creation.

 DocumentModelAdministrationClient client =
     new DocumentModelAdministrationClientBuilder()
         .credential(new AzureKeyCredential("{key}"))
         .endpoint("{endpoint}")
         .buildClient();
 

Let's take a look at the administration client scenarios and their respective usage below.



Build custom document models

Custom document models are built by labelling a dataset of documents with the values you want extracted to address your document scenario needs.

The request must include a `blobContainerUrl` that is an externally accessible Azure storage blob container URI (preferably a Shared Access Signature URI). Note that a container URI (without SAS) is accepted only when the container is public or has a managed identity configured, see more about configuring managed identities to work with Form Recognizer here.

For more information on different custom document model types, refer to custom document models types

Sample: Build a custom document model

The following code sample demonstrates how to build a custom model with your own data.

 // Build custom document analysis model
 String blobContainerUrl = "{SAS_URL_of_your_container_in_blob_storage}";
 // The shared access signature (SAS) Url of your Azure Blob Storage container with your forms.
 String prefix = "{blob_name_prefix}}";
 SyncPoller<OperationResult, DocumentModelDetails> buildOperationPoller =
     documentModelAdminClient.beginBuildDocumentModel(blobContainerUrl,
         DocumentModelBuildMode.TEMPLATE,
         prefix,
         new BuildDocumentModelOptions().setModelId("my-build-model").setDescription("model desc"),
         Context.NONE);

 DocumentModelDetails documentModelDetails = buildOperationPoller.getFinalResult();

 // Model Info
 System.out.printf("Model ID: %s%n", documentModelDetails.getModelId());
 System.out.printf("Model Description: %s%n", documentModelDetails.getDescription());
 System.out.printf("Model created on: %s%n%n", documentModelDetails.getCreatedOn());
 documentModelDetails.getDocumentTypes().forEach((key, documentTypeDetails) -> {
     System.out.printf("Document type: %s%n", key);
     documentTypeDetails.getFieldSchema().forEach((name, documentFieldSchema) -> {
         System.out.printf("Document field: %s%n", name);
         System.out.printf("Document field type: %s%n", documentFieldSchema.getType().toString());
         System.out.printf("Document field confidence: %.2f%n", documentTypeDetails.getFieldConfidence().get(name));
     });
 });
 
Please note that models can also be built using a graphical user interface Form Recognizer Labeling Tool ..

Manage models

Managing models created in your account by building, listing, deleting, and see the limit of custom models your account.

Refer to service quotas and limits to know more your resource models and custom models usage.

Sample: Manage models

This sample demonstrates how to manage (get, list, delete) the models stored in your account.

 AtomicReference<String> modelId = new AtomicReference<>();

 // First, we see how many models we have, and what our limit is
 ResourceDetails resourceDetails = documentModelAdminClient.getResourceDetails();
 System.out.printf("The resource has %s models, and we can have at most %s models",
     resourceDetails.getCustomDocumentModelCount(), resourceDetails.getCustomDocumentModelLimit());

 // Next, we get a paged list of all of our models
 PagedIterable<DocumentModelSummary> customDocumentModels = documentModelAdminClient.listDocumentModels();
 System.out.println("We have following models in the account:");
 customDocumentModels.forEach(documentModelSummary -> {
     System.out.printf("Model ID: %s%n", documentModelSummary.getModelId());
     modelId.set(documentModelSummary.getModelId());

     // get custom document analysis model info
     DocumentModelDetails documentModel = documentModelAdminClient.getDocumentModel(documentModelSummary.getModelId());
     System.out.printf("Model ID: %s%n", documentModel.getModelId());
     System.out.printf("Model Description: %s%n", documentModel.getDescription());
     System.out.printf("Model created on: %s%n", documentModel.getCreatedOn());
     documentModel.getDocumentTypes().forEach((key, documentTypeDetails) -> {
         documentTypeDetails.getFieldSchema().forEach((field, documentFieldSchema) -> {
             System.out.printf("Field: %s", field);
             System.out.printf("Field type: %s", documentFieldSchema.getType());
             System.out.printf("Field confidence: %.2f", documentTypeDetails.getFieldConfidence().get(field));
         });
     });
 });

 // Delete Model
 documentModelAdminClient.deleteDocumentModel(modelId.get());
 
See Also: