litellm - 💡(How to fix) Fix [Bug]: Azure Document Intelligence `prebuilt-layout` via LiteLLM OCR does not return layout-aware Markdown [1 participants]

litellm2026-04-14 09:31:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25687•Fetched 2026-04-16 06:37:15

View on GitHub

Comments

Participants

Timeline

Reactions

Author

whitetrevally

Participants

whitetrevally

Timeline (top)

labeled ×3

Root Cause

Notes:

I am not sure whether this is a LiteLLM OCR formatting issue, a limitation of the Azure prebuilt-layout integration, or expected behavior from the upstream Azure response being normalized into LiteLLM's pages[].markdown.
However, from a user perspective, prebuilt-layout currently does not appear to provide layout-aware Markdown output through the LiteLLM Proxy OCR endpoint.
For comparison, running the same image directly in Azure Document Intelligence Studio with the prebuilt-layout model appears to return structured Markdown/HTML-like layout content in the upstream result.
In Azure Document Intelligence Studio, the response includes contentFormat: "markdown" and the analyzeResult.content contains clearly layout-aware structure such as Markdown headings (#, ##, ###) and HTML-like tags including <table>, <caption>, <figure>, and <figcaption>.
This suggests the upstream Azure Layout model may already be producing richer structured output, and that the flattened result seen through LiteLLM Proxy may be caused by response transformation, normalization, or loss of structure in the OCR endpoint output mapping.

Code Example

## Azure Document Intelligence
- model_name: azure-doc-intel-read
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-read
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

- model_name: azure-doc-intel-layout
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-layout
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

## VertexAI / Mistral
- model_name: mistral-ocr-2505
  litellm_params:
    model: vertex_ai/mistral-ocr-2505
    vertex_project: "project"
    vertex_location: "location"
    vertex_credentials: "/app/credentials/credentials.json"
  model_info:
    mode: ocr

---

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-read",
  "user": "test"
}'

---

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-layout",
  "user": "test"
}'

---

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "mistral-ocr-2505",
  "user": "test"
}'

---

{
  "status": "succeeded",
  "createdDateTime": "2026-04-14T09:13:46Z",
  "lastUpdatedDateTime": "2026-04-14T09:13:46Z",
  "analyzeResult": {
    "apiVersion": "2024-11-30",
    "modelId": "prebuilt-layout",
    "stringIndexType": "utf16CodeUnit",
    "content": "\n\n\n# This is title\n\n\n## 1. Text\n\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n\n\n## 2. Page Objects\n\n\n### 2.1 Table\n\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\n\n\n<table>\n<caption>Table 1: This is a dummy table</caption>\n<tr>\n<th>Name</th>\n<th>Corp</th>\n<th>Remark</th>\n</tr>\n<tr>\n<td>Foo</td>\n<td></td>\n<td></td>\n</tr>\n<tr>\n<td>Bar</td>\n<td>Microsoft</td>\n<td>Dummy</td>\n</tr>\n</table>\n\n\n### 2.2. Figure\n\n\n<figure>\n<figcaption>Figure 1: Here is a figure with text</figcaption>\n\nValues\n\n500\n\n450\n\n400\n\n400\n\n350\n\n300\n\n300\n\n250\n\n200\n\n200\n\n100\n\n0\n\nJan\n\nFeb\n\nMar\n\nApr\n\nMay\n\nJun\n\nMonths\n\n</figure>\n\n\n## 3. Others\n\nAI Document Intelligence is an AI service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\n\n☒\nclear\n\n☒\nprecise\n\n☐\nvague\n\n☒\ncoherent\n\n☐\nIncomprehensible\n\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the AI\nDocument Intelligence studio or SDK.\n\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith AI Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAI model customization.\n\n\n\n",
    "pages": [
      ...
    ],
    "tables": [
      ...
    ],
    "paragraphs": [
      ...
    ],
    "contentFormat": "markdown",
    "sections": [
      ...
    ],
    "figures": [
      ...
    ]
  }
}

### Relevant log output

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using the LiteLLM Proxy /ocr endpoint with Azure Document Intelligence on Docker version 1.82.3, the azure_ai/doc-intelligence/prebuilt-layout model does not appear to return layout-aware Markdown.

For the same input document, the output from prebuilt-layout is very similar to prebuilt-read and looks mostly flattened into plain text. In contrast, vertex_ai/mistral-ocr-2505 returns clearly structured Markdown with headings, tables, and image placeholders.

I expected prebuilt-layout to preserve document structure better than prebuilt-read, or at least produce meaningfully more structured Markdown in pages[].markdown.

Steps to Reproduce

Run LiteLLM Proxy in Docker, version 1.82.3.
Configure these OCR models:
- azure_ai/doc-intelligence/prebuilt-read
- azure_ai/doc-intelligence/prebuilt-layout
- vertex_ai/mistral-ocr-2505
Send the same OCR request to all three models using this sample document: https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true
Compare the returned pages[].markdown output.

Example model config:

## Azure Document Intelligence
- model_name: azure-doc-intel-read
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-read
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

- model_name: azure-doc-intel-layout
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-layout
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

## VertexAI / Mistral
- model_name: mistral-ocr-2505
  litellm_params:
    model: vertex_ai/mistral-ocr-2505
    vertex_project: "project"
    vertex_location: "location"
    vertex_credentials: "/app/credentials/credentials.json"
  model_info:
    mode: ocr

Example request for azure-doc-intel-read:

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-read",
  "user": "test"
}'

Example request for azure-doc-intel-layout:

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-layout",
  "user": "test"
}'

Example request for mistral-ocr-2505:

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "mistral-ocr-2505",
  "user": "test"
}'

Notes:

I am not sure whether this is a LiteLLM OCR formatting issue, a limitation of the Azure prebuilt-layout integration, or expected behavior from the upstream Azure response being normalized into LiteLLM's pages[].markdown.
However, from a user perspective, prebuilt-layout currently does not appear to provide layout-aware Markdown output through the LiteLLM Proxy OCR endpoint.
For comparison, running the same image directly in Azure Document Intelligence Studio with the prebuilt-layout model appears to return structured Markdown/HTML-like layout content in the upstream result.
In Azure Document Intelligence Studio, the response includes contentFormat: "markdown" and the analyzeResult.content contains clearly layout-aware structure such as Markdown headings (#, ##, ###) and HTML-like tags including <table>, <caption>, <figure>, and <figcaption>.
This suggests the upstream Azure Layout model may already be producing richer structured output, and that the flattened result seen through LiteLLM Proxy may be caused by response transformation, normalization, or loss of structure in the OCR endpoint output mapping.

Below is the output result of the same image generated by Azure Document Intelligence Studio using the Layout model.

{
  "status": "succeeded",
  "createdDateTime": "2026-04-14T09:13:46Z",
  "lastUpdatedDateTime": "2026-04-14T09:13:46Z",
  "analyzeResult": {
    "apiVersion": "2024-11-30",
    "modelId": "prebuilt-layout",
    "stringIndexType": "utf16CodeUnit",
    "content": "\n\n\n# This is title\n\n\n## 1. Text\n\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n\n\n## 2. Page Objects\n\n\n### 2.1 Table\n\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\n\n\n<table>\n<caption>Table 1: This is a dummy table</caption>\n<tr>\n<th>Name</th>\n<th>Corp</th>\n<th>Remark</th>\n</tr>\n<tr>\n<td>Foo</td>\n<td></td>\n<td></td>\n</tr>\n<tr>\n<td>Bar</td>\n<td>Microsoft</td>\n<td>Dummy</td>\n</tr>\n</table>\n\n\n### 2.2. Figure\n\n\n<figure>\n<figcaption>Figure 1: Here is a figure with text</figcaption>\n\nValues\n\n500\n\n450\n\n400\n\n400\n\n350\n\n300\n\n300\n\n250\n\n200\n\n200\n\n100\n\n0\n\nJan\n\nFeb\n\nMar\n\nApr\n\nMay\n\nJun\n\nMonths\n\n</figure>\n\n\n## 3. Others\n\nAI Document Intelligence is an AI service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\n\n☒\nclear\n\n☒\nprecise\n\n☐\nvague\n\n☒\ncoherent\n\n☐\nIncomprehensible\n\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the AI\nDocument Intelligence studio or SDK.\n\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith AI Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAI model customization.\n\n\n\n",
    "pages": [
      ...
    ],
    "tables": [
      ...
    ],
    "paragraphs": [
      ...
    ],
    "contentFormat": "markdown",
    "sections": [
      ...
    ],
    "figures": [
      ...
    ]
  }
}

### Relevant log output

```shell
Response excerpt for azure-doc-intel-read:

{
    "pages": [
        {
            "index": 0,
            "markdown": "This is the header of the document.\nThis is title\n1. Text\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n2. Page Objects\n2.1 Table\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\nName\nCorp\nRemark\nFoo\nBar\nMicrosoft\nDummy\nTable 1: This is a dummy table\n2.2. Figure\nFigure 1: Here is a figure with text\n500\n450\n400\n400\n350\n250\n200\n200-\nFeb\nVisu\nMents\n3. Others\nAl Document Intelligence is an Al service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\nclear\nIprecise\nvague\ncoherent\nIncomprehensible\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the Al\nDocument Intelligence studio or SDK.\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith Al Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAl model customization.\nThis is the footer of the document.\n1 | Page",
            "images": null,
            "dimensions": {
                "dpi": 96,
                "height": 909,
                "width": 1200
            }
        }
    ],
    "model": "azure-doc-intel-read",
    "document_annotation": null,
    "usage_info": {
        "pages_processed": 1,
        "doc_size_bytes": null
    },
    "object": "ocr"
}



Response excerpt for azure-doc-intel-layout:

{
    "pages": [
        {
            "index": 0,
            "markdown": "This is the header of the document.\nThis is title\n1. Text\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n2. Page Objects\n2.1 Table\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\nName\nCorp\nRemark\nFoo\nBar\nMicrosoft\nDummy\nTable 1: This is a dummy table\n2.2. Figure\nFigure 1: Here is a figure with text\nValues\n500\n450\n100\n400\n350\n300\n300\n250\n200\n200\n200-\nn\nJan\nFeb\nMar\nİçr\nMay\n2um\nMeness\n3. Others\nAl Document Intelligence is an Al service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\nclear\nprecise\nvague\ncoherent\nIncomprehensible\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the Al\nDocument Intelligence studio or SDK.\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith Al Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAl model customization.\nThis is the footer of the document.\n1 | Page",
            "images": null,
            "dimensions": {
                "dpi": 96,
                "height": 909,
                "width": 1200
            }
        }
    ],
    "model": "azure-doc-intel-layout",
    "document_annotation": null,
    "usage_info": {
        "pages_processed": 1,
        "doc_size_bytes": null
    },
    "object": "ocr"
}



Response excerpt for mistral-ocr-2505:

{
    "pages": [
        {
            "index": 0,
            "markdown": "This is the header of the document.\n\n# This is title\n\n## 1. Text\n\nLatin refers to an ancient Italic language originating in the region of Latium in ancient Rome.\n\n## 2. Page Objects\n\n### 2.1 Table\n\nHere's a sample table below, designed to be simple for easy understand and quick reference.\n\n|  Name | Corp | Remark  |\n| --- | --- | --- |\n|  Foo |  |   |\n|  Bar | Microsoft | Dummy  |\n\n*Table 1: This is a dummy table*\n\n### 2.2. Figure\n\n*Figure 1: Here is a figure with text*\n\n![img-0.jpeg](img-0.jpeg)\n\n## 3. Others\n\nAl Document Intelligence is an AI service that applies advanced machine learning to extract text, key-value pairs, tables, and structures from documents automatically and accurately:\n\n- ☑ clear\n- ☑ precise\n- ☐ vague\n- ☑ coherent\n- ☐ Incomprehensible\n\nTurn documents into usable data and shift your focus to acting on information rather than compiling it. Start with prebuilt models or create custom models tailored to your documents both on premises and in the cloud with the AI Document Intelligence studio or SDK.\n\nLearn how to accelerate your business processes by automating text extraction with AI Document Intelligence. This webinar features hands-on demos for key use cases such as document processing, knowledge mining, and industry-specific AI model customization.",
            "images": [
                {
                    "image_base64": null,
                    "bbox": null,
                    "id": "img-0.jpeg",
                    "top_left_x": 330,
                    "top_left_y": 586,
                    "bottom_right_x": 594,
                    "bottom_right_y": 758,
                    "image_annotation": null
                }
            ],
            "dimensions": {
                "dpi": 200,
                "height": 909,
                "width": 1200
            },
            "tables": [],
            "hyperlinks": [],
            "header": null,
            "footer": null
        }
    ],
    "model": "mistral-ocr-2505",
    "document_annotation": null,
    "usage_info": {
        "pages_processed": 1,
        "doc_size_bytes": 151213
    },
    "object": "ocr"
}

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The azure_ai/doc-intelligence/prebuilt-layout model in LiteLLM Proxy may not be correctly preserving document structure, potentially due to response transformation or normalization issues.

Guidance

Verify the response from the Azure Document Intelligence API to ensure it includes the expected layout-aware Markdown structure.
Compare the contentFormat and analyzeResult.content fields in the Azure API response to the pages[].markdown output in the LiteLLM Proxy response.
Check the LiteLLM Proxy configuration and code for any potential issues with response mapping or normalization that may be causing the loss of structure.
Test the vertex_ai/mistral-ocr-2505 model as a potential workaround, as it appears to produce the expected layout-aware Markdown output.

Example

No code example is provided, as the issue appears to be related to configuration or response mapping rather than a specific code snippet.

Notes

The issue may be specific to the azure_ai/doc-intelligence/prebuilt-layout model or the LiteLLM Proxy version 1.82.3. Further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by using the vertex_ai/mistral-ocr-2505 model, which appears to produce the expected layout-aware Markdown output, until the issue with the azure_ai/doc-intelligence/prebuilt-layout model is resolved.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #dependency error #configuration error #environment variable #network issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Azure Document Intelligence `prebuilt-layout` via LiteLLM OCR does not return layout-aware Markdown [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Check for existing issues

What happened?

Steps to Reproduce

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Azure Document Intelligence `prebuilt-layout` via LiteLLM OCR does not return layout-aware Markdown [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Check for existing issues

What happened?

Steps to Reproduce

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING