Class DocumentPage


  • public final class DocumentPage
    extends Object
    Content and layout elements extracted from a page from the input.
    • Constructor Detail

      • DocumentPage

        public DocumentPage()
    • Method Detail

      • getPageNumber

        public int getPageNumber()
        Get the pageNumber property: 1-based page number in the input document.
        Returns:
        the pageNumber value.
      • getAngle

        public float getAngle()
        Get the angle property: The general orientation of the content in clockwise direction, measured in degrees between (-180, 180].
        Returns:
        the angle value.
      • getWidth

        public float getWidth()
        Get the width property: The width of the image/PDF in pixels/inches, respectively.
        Returns:
        the width value.
      • getHeight

        public float getHeight()
        Get the height property: The height of the image/PDF in pixels/inches, respectively.
        Returns:
        the height value.
      • getUnit

        public LengthUnit getUnit()
        Get the unit property: The unit used by the width, height, and boundingBox properties. For images, the unit is "pixel". For PDF, the unit is "inch".
        Returns:
        the unit value.
      • getSpans

        public List<DocumentSpan> getSpans()
        Get the spans property: Location of the page in the reading order concatenated content.
        Returns:
        the spans value.
      • getWords

        public List<DocumentWord> getWords()
        Get the words property: Extracted words from the page.
        Returns:
        the words value.
      • getSelectionMarks

        public List<DocumentSelectionMark> getSelectionMarks()
        Get the selectionMarks property: Extracted selection marks from the page.
        Returns:
        the selectionMarks value.
      • getLines

        public List<DocumentLine> getLines()
        Get the lines property: Extracted lines from the page, potentially containing both textual and visual elements.
        Returns:
        the lines value.