Options
All
  • Public
  • Public/Protected
  • All
Menu

Class DocumentAnalysisClient

Package version

A client for interacting with the Form Recognizer service's analysis features.

Examples:

The Form Recognizer service and clients support two means of authentication:

Azure Active Directory

import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";
import { DefaultAzureCredential } from "@azure/identity";

const endpoint = "https://<resource name>.cognitiveservices.azure.com";
const credential = new DefaultAzureCredential();

const client = new DocumentAnalysisClient(endpoint, credential);

API Key (Subscription Key)

import { DocumentAnalysisClient, AzureKeyCredential } from "@azure/ai-form-recognizer";

const endpoint = "https://<resource name>.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("<api key>");

const client = new DocumentAnalysisClient(endpoint, credential);

Hierarchy

  • DocumentAnalysisClient

Index

Methods

beginAnalyzeDocuments

  • Extract data from an input using a model given by its unique ID.

    This operation supports custom as well as prebuilt models. For example, to use the prebuilt invoice model, provide the model ID "prebuilt-invoice", or to use the simpler prebuilt layout model, provide the model ID "prebuilt-layout".

    The fields produced in the AnalyzeResponse depend on the model that is used for analysis, and the values in any extracted documents' fields depend on the document types in the model (if any) and their corresponding field schemas.

    Examples

    This method supports both URLs (string) and streamable request bodies (FormRecognizerRequestBody) such as Node.JS ReadableStream objects, browser Blobs, and ArrayBuffers.

    From URL

    The Form Recognizer service will attempt to download a file using the submitted URL, so the URL must be accessible from the public internet. For example, a SAS token can be used to grant read access to a blob in Azure Storage, and the service will use the SAS-encoded URL to request the file.

    // the URL must be publicly accessible
    const url = "<receipt document url>";
    
    // The model that is passed to the following function call determines the type of the eventual result. In the
    // example, we will use the prebuilt receipt model, but you could use a custom model ID/name instead.
    const poller = await client.beginAnalyzeDocuments("prebuilt-receipt", url);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles, // text styles (ex. handwriting) that were observed in the document
      keyValuePairs, // extracted pairs of elements  (directed associations from one element in the input to another)
      entities, // extracted entities in the input's content, which are categorized (ex. "Location" or "Organization")
      documents // extracted documents (instances of one of the model's document types and its field schema)
    } = await poller.pollUntilDone();
    
    // Extract the fields of the first document. These fields constitute a receipt, because we used the receipt model
    const [{ fields: receipt }] = documents;
    
    // The fields correspond to the model's document types and their field schemas. Refer to the Form Recognizer
    // documentation for information about the document types and field schemas within a model, or use the `getModel`
    // operation to view this information programmatically.
    console.log("The type of this receipt is:", receipt?.["ReceiptType"]?.value);

    From Request Body

    Alternatively, if the file is local (or in memory in the browser), a binary object can be uploaded. The following example uses the Node.JS filesystem API.

    import * as fs from "fs";
    
    const file = fs.createReadStream("path/to/receipt.pdf");
    
    // The model that is passed to the following function call determines the type of the eventual result. In the
    // example, we will use the prebuilt receipt model, but you could use a custom model ID/name instead.
    const poller = await client.beginAnalyzeDocuments("prebuilt-receipt", file);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles, // text styles (ex. handwriting) that were observed in the document
      keyValuePairs, // extracted pairs of elements  (directed associations from one element in the input to another)
      entities, // extracted entities in the input's content, which are categorized (ex. "Location" or "Organization")
      documents // extracted documents (instances of one of the model's document types and its field schema)
    } = await poller.pollUntilDone();
    
    // Extract the fields of the first document. These fields constitute a receipt, because we used the receipt model
    const [{ fields: receipt }] = documents;
    
    // The fields correspond to the model's document types and their field schemas. Refer to the Form Recognizer
    // documentation for information about the document types and field schemas within a model, or use the `getModel`
    // operation to view this information programmatically.
    console.log("The type of this receipt is:", receipt?.["ReceiptType"]?.value);

    Parameters

    Returns Promise<AnalysisPoller>

    a long-running operation (poller) that will eventually produce an AnalyzeResult

  • Extract data from an input using a model that has a known, strongly-typed document schema (a DocumentModel). It is not currently possible to define a custom DocumentModel instance in the SDK, so only the models that are provided as part of PrebuiltModels can be used with this method overload.

    See PrebuiltModels.

    The fields produced in the AnalyzeResponse depend on the model that is used for analysis. In TypeScript, the type of the result for this method overload is inferred from the type of the input DocumentModel.

    Examples

    This method supports both URLs (string) and streamable request bodies (FormRecognizerRequestBody) such as Node.JS ReadableStream objects, browser Blobs, and ArrayBuffers.

    From URL

    The Form Recognizer service will attempt to download a file using the submitted URL, so the URL must be accessible from the public internet. For example, a SAS token can be used to grant read access to a blob in Azure Storage, and the service will use the SAS-encoded URL to request the file.

    import { PrebuiltModels } from "@azure/ai-form-recognizer";
    
    // the URL must be publicly accessible
    const url = "<receipt document url>";
    
    // The model that is passed to the following function call determines the type of the eventual result. In the
    // example, we will use the prebuilt receipt model.
    const poller = await client.beginAnalyzeDocuments(PrebuiltModels.Receipt, url);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles, // text styles (ex. handwriting) that were observed in the document
      keyValuePairs, // extracted pairs of elements  (directed associations from one element in the input to another)
      entities, // extracted entities in the input's content, which are categorized (ex. "Location" or "Organization")
    
      documents // extracted documents (instances of one of the model's document types and its field schema)
    } = await poller.pollUntilDone();
    
    // Extract the fields of the first document. These fields constitute a receipt, because we used the receipt model
    const [{ fields: receipt }] = documents;
    
    // Since we used the strongly-typed PrebuiltModels.Receipt object instead of the "prebuilt-receipt" model ID
    // string, the fields of the receipt are strongly-typed and have camelCase names (as opposed to PascalCase).
    console.log("The type of this receipt is:", receipt.receiptType?.value);

    From Request Body

    Alternatively, if the file is local (or in memory in the browser), a binary object can be uploaded. The following example uses the Node.JS filesystem API.

    import * as fs from "fs";
    import { PrebuiltModels } from "@azure/ai-form-recognizer";
    
    const file = fs.createReadStream("path/to/receipt.pdf");
    
    // The model that is passed to the following function call determines the type of the eventual result. In the
    // example, we will use the prebuilt receipt model.
    const poller = await client.beginAnalyzeDocuments(PrebuiltModels.Receipt, file);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles, // text styles (ex. handwriting) that were observed in the document
      keyValuePairs, // extracted pairs of elements  (directed associations from one element in the input to another)
      entities, // extracted entities in the input's content, which are categorized (ex. "Location" or "Organization")
    
      documents // extracted documents (instances of one of the model's document types and its field schema)
    } = await poller.pollUntilDone();
    
    // Extract the fields of the first document. These fields constitute a receipt, because we used the receipt model
    const [{ fields: receipt }] = documents;
    
    // Since we used the strongly-typed PrebuiltModels.Receipt object instead of the "prebuilt-receipt" model ID
    // string, the fields of the receipt are strongly-typed and have camelCase names (as opposed to PascalCase).
    console.log("The type of this receipt is:", receipt.receiptType?.value);

    Type parameters

    • Document

    Parameters

    Returns Promise<AnalysisPoller<AnalyzeResult<Document>>>

    a long-running operation (poller) that will eventually produce an AnalyzeResult with documents that have the result type associated with the input model

beginExtractGenericDocument

  • Extracts generic document information from an input file. The generic document result includes the information from layout analysis (pages, tables, and styles) as well as extracted key-value pairs and entities.

    Examples

    This method supports both URLs (string) and streamable request bodies (FormRecognizerRequestBody) such as Node.JS ReadableStream objects, browser Blobs, and ArrayBuffers.

    From URL

    The Form Recognizer service will attempt to download a file using the submitted URL, so the URL must be accessible from the public internet. For example, a SAS token can be used to grant read access to a blob in Azure Storage, and the service will use the SAS-encoded URL to request the file.

    // the URL must be publicly accessible
    const url = "<document url>";
    
    const poller = await client.beginExtractGenericDocument(url);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      // the operation produces the fields from the layout operation
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles, // text styles (ex. handwriting) that were observed in the document
    
      // it also produces the following fields in addition
      keyValuePairs, // extracted pairs of elements  (directed associations from one element in the input to another)
      entities // extracted entities in the input's content, which are categorized (ex. "Location" or "Organization")
    } = await poller.pollUntilDone();

    From Request Body

    Alternatively, if the file is local (or in memory in the browser), a binary object can be uploaded. The following example uses the Node.JS filesystem API.

    import * as fs from "fs";
    
    const file = fs.createReadStream("path/to/file.pdf");
    
    const poller = await client.beginExtractGenericDocument(file);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      // the operation produces the fields from the layout operation
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles, // text styles (ex. handwriting) that were observed in the document
    
      // it also produces the following fields in addition
      keyValuePairs, // extracted pairs of elements  (directed associations from one element in the input to another)
      entities // extracted entities in the input's content, which are categorized (ex. "Location" or "Organization")
    } = await poller.pollUntilDone();

    Parameters

    Returns Promise<AnalysisPoller<GenericDocumentResult>>

    a long-running operation (poller) that will eventually produce a generic document result or an error

beginExtractLayout

  • Extracts only the layout (basic OCR information) from an input file. The layout result includes information about the pages and their text contents, extracted tables, and identified text styles.

    Examples

    This method supports both URLs (string) and streamable request bodies (FormRecognizerRequestBody) such as Node.JS ReadableStream objects, browser Blobs, and ArrayBuffers.

    From URL

    The Form Recognizer service will attempt to download a file using the submitted URL, so the URL must be accessible from the public internet. For example, a SAS token can be used to grant read access to a blob in Azure Storage, and the service will use the SAS-encoded URL to request the file.

    // the URL must be publicly accessible
    const url = "<document url>";
    
    const poller = await client.beginExtractLayout(url);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles // text styles (ex. handwriting) that were observed in the document
    } = await poller.pollUntilDone();

    From Request Body

    Alternatively, if the file is local (or in memory in the browser), a binary object can be uploaded. The following example uses the Node.JS filesystem API.

    import * as fs from "fs";
    
    const file = fs.createReadStream("path/to/file.pdf");
    
    const poller = await client.beginExtractLayout(file);
    
    // The result is a long-running operation (poller), which must itself be polled until the operation completes
    const {
      pages, // pages extracted from the document, which contain lines and words
      tables, // extracted tables, organized into cells that contain their contents
      styles // text styles (ex. handwriting) that were observed in the document
    } = await poller.pollUntilDone();

    Parameters

    Returns Promise<AnalysisPoller<LayoutResult>>

    a long-running operation (poller) that will eventually produce a layout result or an error

Generated using TypeDoc