Package com.azure.ai.formrecognizer.training
Azure Form Recognizer is a cloud-based service provided by Microsoft Azure that utilizes machine learning to extract information from various types of forms. It is designed to automate the process of form recognition, data extraction, and form understanding. Azure Form Recognizer can handle structured forms, such as invoices, receipts, and surveys, as well as unstructured form data, such as contracts, agreements, and financial reports.
The service uses advanced optical character recognition (OCR) technology to extract text and key-value pairs from custom forms, enabling organizations to automate data entry tasks that would otherwise require manual effort. It can recognize and extract information like dates, addresses, invoice numbers, line items, and other relevant data points from forms.
The Azure Form Recognizer client library allows Java developers to interact with the Azure Form Recognizer service. It provides a set of classes and methods that abstract the underlying RESTful API of Azure Form Recognizer, making it easier to integrate the service into Java applications.
The Azure Form Recognizer client library provides the following capabilities:
- Form recognizing: It allows you to submit forms to extract information like text, key-value pairs, tables, and form fields. You can analyze both structured and unstructured documents.
- Model Management: It enables you to train custom models by providing labeled training data. You can also list and delete existing models.
- Recognize Results: It provides methods to retrieve and interpret analysis results, including extracted text and field values, confidence scores, and form layout information.
- Polling and Callbacks: It includes mechanisms for polling the service to check the status of an analysis operation or registering callbacks to receive notifications when the analysis is complete.
Getting Started
The Azure Form Recognizer library provides
analysis clients like FormRecognizerAsyncClient
and FormRecognizerClient
to connect to the Form Recognizer Azure Cognitive
Service to analyze information from documents and extract it into structured data.
It also provides training clients like FormTrainingClient
and FormTrainingAsyncClient
to build and manage models from custom
documents.
Note: This client only supports
FormRecognizerServiceVersion.V2_1
and lower.
Recommended to use a newer service version,
DocumentAnalysisClient
and
DocumentModelAdministrationClient
.
Refer to the Migration guide to use API versions 2022-08-31 and up.
Service clients are the point of interaction for developers to use Azure Form Recognizer.
FormRecognizerClient
is the synchronous service client and
FormRecognizerAsyncClient
is the asynchronous service client. The examples
shown in this document use a credential object named DefaultAzureCredential for authentication, which is appropriate
for most scenarios, including local development and production environments. Additionally, we recommend using
managed identity
for authentication in production environments.
You can find more information on different ways of authenticating and their corresponding credential types in the
Azure Identity documentation".
Sample: Construct a FormTrainingClient with DefaultAzureCredential
The following code sample demonstrates the creation of a
FormTrainingClient
, using
the `DefaultAzureCredentialBuilder` to configure it.
FormTrainingClient client = new FormTrainingClientBuilder() .endpoint("{endpoint}") .credential(new DefaultAzureCredentialBuilder().build()) .buildClient();
Further, see the code sample below to use
AzureKeyCredential
for client creation.
FormTrainingClient formTrainingClient = new FormTrainingClientBuilder() .credential(new AzureKeyCredential("{key}")) .endpoint("{endpoint}") .buildClient();
Let's take a look at the analysis client scenarios and their respective usage below.
Train a model
A trained model can output structured data that includes the relationships in the original form document. For instructions on setting up forms for training in an Azure Blob Storage Container, see
You can train custom models to recognize specific fields and values you specify by labeling your custom forms.
Sample: Train a model with your own data
The following code sample demonstrates how to train a model with your own data.
String trainingFilesUrl = "{SAS_URL_of_your_container_in_blob_storage}"; SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = formTrainingClient.beginTraining(trainingFilesUrl, false, new TrainingOptions() .setModelName("my model trained without labels"), Context.NONE); CustomFormModel customFormModel = trainingPoller.getFinalResult(); // Model Info System.out.printf("Model Id: %s%n", customFormModel.getModelId()); System.out.printf("Model name given by user: %s%n", customFormModel.getModelName()); System.out.printf("Model Status: %s%n", customFormModel.getModelStatus()); System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn()); System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn()); System.out.println("Recognized Fields:"); // looping through the subModels, which contains the fields they were trained on // Since the given training documents are unlabeled, we still group them but they do not have a label. customFormModel.getSubmodels().forEach(customFormSubmodel -> { System.out.printf("Submodel Id: %s%n: ", customFormSubmodel.getModelId()); // Since the training data is unlabeled, we are unable to return the accuracy of this model customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out.printf("Field: %s Field Label: %s%n", field, customFormModelField.getLabel())); });
Please note that models can also be trained using a graphical user interface such as the Form Recognizer Labeling Tool.
Manage custom models
Operations that can be executed are:
- Check the number of models in the FormRecognizer resource account, and the maximum number of models that can be stored.
- List the models currently stored in the resource account.
- Get a specific model using the model's Id.
- Delete a model from the resource account.
Sample: Manage custom models stored in your account.
This sample demonstrates how to manage the custom models stored in your account.
// First, we see how many custom models we have, and what our limit is AccountProperties accountProperties = formTrainingClient.getAccountProperties(); System.out.printf("The account has %d custom models, and we can have at most %d custom models", accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit()); // Next, we get a paged list of all of our custom models PagedIterable<CustomFormModelInfo> customModels = formTrainingClient.listCustomModels(); System.out.println("We have following models in the account:"); customModels.forEach(customFormModelInfo -> { System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId()); // get specific custom model info CustomFormModel customModel = formTrainingClient.getCustomModel(customFormModelInfo.getModelId()); System.out.printf("Model Status: %s%n", customModel.getModelStatus()); System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn()); System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn()); customModel.getSubmodels().forEach(customFormSubmodel -> { System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType()); System.out.printf("Custom Model Accuracy: %f%n", customFormSubmodel.getAccuracy()); if (customFormSubmodel.getFields() != null) { customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> { System.out.printf("Field Text: %s%n", fieldText); System.out.printf("Field Accuracy: %f%n", customFormModelField.getAccuracy()); }); } }); }); // Delete Custom Model formTrainingClient.deleteModel("{modelId}");
For a suggested approach to extracting information from custom forms with known fields, see strongly-typing a recognized form.
-
ClassDescriptionThis class provides an asynchronous client to connect to the Form Recognizer Azure Cognitive Service.This class provides a synchronous client to connect to the Form Recognizer Azure Cognitive Service.This class provides a fluent builder API to help instantiation of
FormTrainingClient
andFormTrainingAsyncClient
, callFormTrainingClientBuilder.buildClient()
buildClient} andbuildAsyncClient
respectively to construct an instance of the desired client.