litellm - 💡(How to fix) Fix [Bug]: Azure Document Intelligence `prebuilt-layout` via LiteLLM OCR does not return layout-aware Markdown [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25687Fetched 2026-04-16 06:37:15
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

Root Cause

Notes:

  • I am not sure whether this is a LiteLLM OCR formatting issue, a limitation of the Azure prebuilt-layout integration, or expected behavior from the upstream Azure response being normalized into LiteLLM's pages[].markdown.
  • However, from a user perspective, prebuilt-layout currently does not appear to provide layout-aware Markdown output through the LiteLLM Proxy OCR endpoint.
  • For comparison, running the same image directly in Azure Document Intelligence Studio with the prebuilt-layout model appears to return structured Markdown/HTML-like layout content in the upstream result.
  • In Azure Document Intelligence Studio, the response includes contentFormat: "markdown" and the analyzeResult.content contains clearly layout-aware structure such as Markdown headings (#, ##, ###) and HTML-like tags including <table>, <caption>, <figure>, and <figcaption>.
  • This suggests the upstream Azure Layout model may already be producing richer structured output, and that the flattened result seen through LiteLLM Proxy may be caused by response transformation, normalization, or loss of structure in the OCR endpoint output mapping.

Code Example

## Azure Document Intelligence
- model_name: azure-doc-intel-read
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-read
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

- model_name: azure-doc-intel-layout
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-layout
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

## VertexAI / Mistral
- model_name: mistral-ocr-2505
  litellm_params:
    model: vertex_ai/mistral-ocr-2505
    vertex_project: "project"
    vertex_location: "location"
    vertex_credentials: "/app/credentials/credentials.json"
  model_info:
    mode: ocr

---

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-read",
  "user": "test"
}'

---

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-layout",
  "user": "test"
}'

---

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "mistral-ocr-2505",
  "user": "test"
}'

---

{
  "status": "succeeded",
  "createdDateTime": "2026-04-14T09:13:46Z",
  "lastUpdatedDateTime": "2026-04-14T09:13:46Z",
  "analyzeResult": {
    "apiVersion": "2024-11-30",
    "modelId": "prebuilt-layout",
    "stringIndexType": "utf16CodeUnit",
    "content": "\n\n\n# This is title\n\n\n## 1. Text\n\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n\n\n## 2. Page Objects\n\n\n### 2.1 Table\n\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\n\n\n<table>\n<caption>Table 1: This is a dummy table</caption>\n<tr>\n<th>Name</th>\n<th>Corp</th>\n<th>Remark</th>\n</tr>\n<tr>\n<td>Foo</td>\n<td></td>\n<td></td>\n</tr>\n<tr>\n<td>Bar</td>\n<td>Microsoft</td>\n<td>Dummy</td>\n</tr>\n</table>\n\n\n### 2.2. Figure\n\n\n<figure>\n<figcaption>Figure 1: Here is a figure with text</figcaption>\n\nValues\n\n500\n\n450\n\n400\n\n400\n\n350\n\n300\n\n300\n\n250\n\n200\n\n200\n\n100\n\n0\n\nJan\n\nFeb\n\nMar\n\nApr\n\nMay\n\nJun\n\nMonths\n\n</figure>\n\n\n## 3. Others\n\nAI Document Intelligence is an AI service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\n\n☒\nclear\n\n☒\nprecise\n\n☐\nvague\n\n☒\ncoherent\n\n☐\nIncomprehensible\n\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the AI\nDocument Intelligence studio or SDK.\n\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith AI Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAI model customization.\n\n\n\n",
    "pages": [
      ...
    ],
    "tables": [
      ...
    ],
    "paragraphs": [
      ...
    ],
    "contentFormat": "markdown",
    "sections": [
      ...
    ],
    "figures": [
      ...
    ]
  }
}

### Relevant log output
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using the LiteLLM Proxy /ocr endpoint with Azure Document Intelligence on Docker version 1.82.3, the azure_ai/doc-intelligence/prebuilt-layout model does not appear to return layout-aware Markdown.

For the same input document, the output from prebuilt-layout is very similar to prebuilt-read and looks mostly flattened into plain text. In contrast, vertex_ai/mistral-ocr-2505 returns clearly structured Markdown with headings, tables, and image placeholders.

I expected prebuilt-layout to preserve document structure better than prebuilt-read, or at least produce meaningfully more structured Markdown in pages[].markdown.

Steps to Reproduce

  1. Run LiteLLM Proxy in Docker, version 1.82.3.
  2. Configure these OCR models:
    • azure_ai/doc-intelligence/prebuilt-read
    • azure_ai/doc-intelligence/prebuilt-layout
    • vertex_ai/mistral-ocr-2505
  3. Send the same OCR request to all three models using this sample document: https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true
  4. Compare the returned pages[].markdown output.

Example model config:

## Azure Document Intelligence
- model_name: azure-doc-intel-read
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-read
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

- model_name: azure-doc-intel-layout
  litellm_params:
    model: azure_ai/doc-intelligence/prebuilt-layout
    api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
    api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
  model_info:
    mode: ocr

## VertexAI / Mistral
- model_name: mistral-ocr-2505
  litellm_params:
    model: vertex_ai/mistral-ocr-2505
    vertex_project: "project"
    vertex_location: "location"
    vertex_credentials: "/app/credentials/credentials.json"
  model_info:
    mode: ocr

Example request for azure-doc-intel-read:

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-read",
  "user": "test"
}'

Example request for azure-doc-intel-layout:

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "azure-doc-intel-layout",
  "user": "test"
}'

Example request for mistral-ocr-2505:

curl --location 'http://localhost:4000/ocr' \
--header 'Authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
  "document": {
    "type": "document_url",
    "document_url": "https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Data/layout/layout-pageobject.png?raw=true"
  },
  "model": "mistral-ocr-2505",
  "user": "test"
}'

Notes:

  • I am not sure whether this is a LiteLLM OCR formatting issue, a limitation of the Azure prebuilt-layout integration, or expected behavior from the upstream Azure response being normalized into LiteLLM's pages[].markdown.
  • However, from a user perspective, prebuilt-layout currently does not appear to provide layout-aware Markdown output through the LiteLLM Proxy OCR endpoint.
  • For comparison, running the same image directly in Azure Document Intelligence Studio with the prebuilt-layout model appears to return structured Markdown/HTML-like layout content in the upstream result.
  • In Azure Document Intelligence Studio, the response includes contentFormat: "markdown" and the analyzeResult.content contains clearly layout-aware structure such as Markdown headings (#, ##, ###) and HTML-like tags including <table>, <caption>, <figure>, and <figcaption>.
  • This suggests the upstream Azure Layout model may already be producing richer structured output, and that the flattened result seen through LiteLLM Proxy may be caused by response transformation, normalization, or loss of structure in the OCR endpoint output mapping.

Below is the output result of the same image generated by Azure Document Intelligence Studio using the Layout model.

{
  "status": "succeeded",
  "createdDateTime": "2026-04-14T09:13:46Z",
  "lastUpdatedDateTime": "2026-04-14T09:13:46Z",
  "analyzeResult": {
    "apiVersion": "2024-11-30",
    "modelId": "prebuilt-layout",
    "stringIndexType": "utf16CodeUnit",
    "content": "\n\n\n# This is title\n\n\n## 1. Text\n\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n\n\n## 2. Page Objects\n\n\n### 2.1 Table\n\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\n\n\n<table>\n<caption>Table 1: This is a dummy table</caption>\n<tr>\n<th>Name</th>\n<th>Corp</th>\n<th>Remark</th>\n</tr>\n<tr>\n<td>Foo</td>\n<td></td>\n<td></td>\n</tr>\n<tr>\n<td>Bar</td>\n<td>Microsoft</td>\n<td>Dummy</td>\n</tr>\n</table>\n\n\n### 2.2. Figure\n\n\n<figure>\n<figcaption>Figure 1: Here is a figure with text</figcaption>\n\nValues\n\n500\n\n450\n\n400\n\n400\n\n350\n\n300\n\n300\n\n250\n\n200\n\n200\n\n100\n\n0\n\nJan\n\nFeb\n\nMar\n\nApr\n\nMay\n\nJun\n\nMonths\n\n</figure>\n\n\n## 3. Others\n\nAI Document Intelligence is an AI service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\n\n☒\nclear\n\n☒\nprecise\n\n☐\nvague\n\n☒\ncoherent\n\n☐\nIncomprehensible\n\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the AI\nDocument Intelligence studio or SDK.\n\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith AI Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAI model customization.\n\n\n\n",
    "pages": [
      ...
    ],
    "tables": [
      ...
    ],
    "paragraphs": [
      ...
    ],
    "contentFormat": "markdown",
    "sections": [
      ...
    ],
    "figures": [
      ...
    ]
  }
}

### Relevant log output

```shell
Response excerpt for azure-doc-intel-read:

{
    "pages": [
        {
            "index": 0,
            "markdown": "This is the header of the document.\nThis is title\n1. Text\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n2. Page Objects\n2.1 Table\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\nName\nCorp\nRemark\nFoo\nBar\nMicrosoft\nDummy\nTable 1: This is a dummy table\n2.2. Figure\nFigure 1: Here is a figure with text\n500\n450\n400\n400\n350\n250\n200\n200-\nFeb\nVisu\nMents\n3. Others\nAl Document Intelligence is an Al service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\nclear\nIprecise\nvague\ncoherent\nIncomprehensible\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the Al\nDocument Intelligence studio or SDK.\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith Al Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAl model customization.\nThis is the footer of the document.\n1 | Page",
            "images": null,
            "dimensions": {
                "dpi": 96,
                "height": 909,
                "width": 1200
            }
        }
    ],
    "model": "azure-doc-intel-read",
    "document_annotation": null,
    "usage_info": {
        "pages_processed": 1,
        "doc_size_bytes": null
    },
    "object": "ocr"
}



Response excerpt for azure-doc-intel-layout:

{
    "pages": [
        {
            "index": 0,
            "markdown": "This is the header of the document.\nThis is title\n1. Text\nLatin refers to an ancient Italic language\noriginating in the region of Latium in\nancient Rome.\n2. Page Objects\n2.1 Table\nHere's a sample table below, designed to\nbe simple for easy understand and quick\nreference.\nName\nCorp\nRemark\nFoo\nBar\nMicrosoft\nDummy\nTable 1: This is a dummy table\n2.2. Figure\nFigure 1: Here is a figure with text\nValues\n500\n450\n100\n400\n350\n300\n300\n250\n200\n200\n200-\nn\nJan\nFeb\nMar\nİçr\nMay\n2um\nMeness\n3. Others\nAl Document Intelligence is an Al service\nthat applies advanced machine learning\nto extract text, key-value pairs, tables,\nand structures from documents\nautomatically and accurately:\nclear\nprecise\nvague\ncoherent\nIncomprehensible\nTurn documents into usable data and\nshift your focus to acting on information\nrather than compiling it. Start with\nprebuilt models or create custom models\ntailored to your documents both on\npremises and in the cloud with the Al\nDocument Intelligence studio or SDK.\nLearn how to accelerate your business\nprocesses by automating text extraction\nwith Al Document Intelligence. This\nwebinar features hands-on demos for key\nuse cases such as document processing,\nknowledge mining, and industry-specific\nAl model customization.\nThis is the footer of the document.\n1 | Page",
            "images": null,
            "dimensions": {
                "dpi": 96,
                "height": 909,
                "width": 1200
            }
        }
    ],
    "model": "azure-doc-intel-layout",
    "document_annotation": null,
    "usage_info": {
        "pages_processed": 1,
        "doc_size_bytes": null
    },
    "object": "ocr"
}



Response excerpt for mistral-ocr-2505:

{
    "pages": [
        {
            "index": 0,
            "markdown": "This is the header of the document.\n\n# This is title\n\n## 1. Text\n\nLatin refers to an ancient Italic language originating in the region of Latium in ancient Rome.\n\n## 2. Page Objects\n\n### 2.1 Table\n\nHere's a sample table below, designed to be simple for easy understand and quick reference.\n\n|  Name | Corp | Remark  |\n| --- | --- | --- |\n|  Foo |  |   |\n|  Bar | Microsoft | Dummy  |\n\n*Table 1: This is a dummy table*\n\n### 2.2. Figure\n\n*Figure 1: Here is a figure with text*\n\n![img-0.jpeg](img-0.jpeg)\n\n## 3. Others\n\nAl Document Intelligence is an AI service that applies advanced machine learning to extract text, key-value pairs, tables, and structures from documents automatically and accurately:\n\n- ☑ clear\n- ☑ precise\n- ☐ vague\n- ☑ coherent\n- ☐ Incomprehensible\n\nTurn documents into usable data and shift your focus to acting on information rather than compiling it. Start with prebuilt models or create custom models tailored to your documents both on premises and in the cloud with the AI Document Intelligence studio or SDK.\n\nLearn how to accelerate your business processes by automating text extraction with AI Document Intelligence. This webinar features hands-on demos for key use cases such as document processing, knowledge mining, and industry-specific AI model customization.",
            "images": [
                {
                    "image_base64": null,
                    "bbox": null,
                    "id": "img-0.jpeg",
                    "top_left_x": 330,
                    "top_left_y": 586,
                    "bottom_right_x": 594,
                    "bottom_right_y": 758,
                    "image_annotation": null
                }
            ],
            "dimensions": {
                "dpi": 200,
                "height": 909,
                "width": 1200
            },
            "tables": [],
            "hyperlinks": [],
            "header": null,
            "footer": null
        }
    ],
    "model": "mistral-ocr-2505",
    "document_annotation": null,
    "usage_info": {
        "pages_processed": 1,
        "doc_size_bytes": 151213
    },
    "object": "ocr"
}

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The azure_ai/doc-intelligence/prebuilt-layout model in LiteLLM Proxy may not be correctly preserving document structure, potentially due to response transformation or normalization issues.

Guidance

  • Verify the response from the Azure Document Intelligence API to ensure it includes the expected layout-aware Markdown structure.
  • Compare the contentFormat and analyzeResult.content fields in the Azure API response to the pages[].markdown output in the LiteLLM Proxy response.
  • Check the LiteLLM Proxy configuration and code for any potential issues with response mapping or normalization that may be causing the loss of structure.
  • Test the vertex_ai/mistral-ocr-2505 model as a potential workaround, as it appears to produce the expected layout-aware Markdown output.

Example

No code example is provided, as the issue appears to be related to configuration or response mapping rather than a specific code snippet.

Notes

The issue may be specific to the azure_ai/doc-intelligence/prebuilt-layout model or the LiteLLM Proxy version 1.82.3. Further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by using the vertex_ai/mistral-ocr-2505 model, which appears to produce the expected layout-aware Markdown output, until the issue with the azure_ai/doc-intelligence/prebuilt-layout model is resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Azure Document Intelligence `prebuilt-layout` via LiteLLM OCR does not return layout-aware Markdown [1 participants]