llamaIndex - ✅(Solved) Fix [Bug]: Ollama does not respect the client's initialization [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21086Fetched 2026-04-08 01:03:36
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×2closed ×1commented ×1cross-referenced ×1

Error Message

This error is 401, because it is not sending the headers

Traceback (most recent call last): File "test.py", line 12, in <module> resp = llm.complete("hi") ^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 447, in wrapped_llm_predict f_return_val = f(_self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 659, in complete return chat_to_completion_decorator(self.chat)(prompt, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/core/base/llms/generic_utils.py", line 184, in wrapper chat_response = func(messages, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 181, in wrapped_llm_chat f_return_val = f(_self, messages, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 404, in chat options=self._model_kwargs, ^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 214, in _model_kwargs "num_ctx": self.get_context_window(), ^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 224, in get_context_window info = self.client.show(self.model).modelinfo ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 637, in show return self._request( ^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 190, in _request return cls(**self._request_raw(*args, **kwargs).json()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 134, in _request_raw raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: Unauthorized (status code: 401)

Root Cause

This error is 401, because it is not sending the headers

Fix Action

Fix / Workaround

Traceback (most recent call last): File "test.py", line 12, in <module> resp = llm.complete("hi") ^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 447, in wrapped_llm_predict f_return_val = f(_self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 659, in complete return chat_to_completion_decorator(self.chat)(prompt, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/core/base/llms/generic_utils.py", line 184, in wrapper chat_response = func(messages, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 181, in wrapped_llm_chat f_return_val = f(_self, messages, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 404, in chat options=self._model_kwargs, ^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 214, in _model_kwargs "num_ctx": self.get_context_window(), ^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 224, in get_context_window info = self.client.show(self.model).modelinfo ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 637, in show return self._request( ^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 190, in _request return cls(**self._request_raw(*args, **kwargs).json()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 134, in _request_raw raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: Unauthorized (status code: 401)

PR fix notes

PR #21091: fix(ollama): pass custom headers to auto-created clients

Description (problem / solution / changelog)

Description

When users need authentication headers (e.g. Authorization: Bearer) for remote Ollama instances, they currently have to manually construct Client / AsyncClient objects and pass them to the Ollama constructor. This is unintuitive because the auto-created fallback clients silently discard any auth context.

This PR adds a headers parameter to the Ollama class. When set, auto-created sync and async clients inherit these headers. Explicitly passed client / async_client objects still take precedence (existing behavior preserved).

Changes

  • Add headers: Optional[Dict[str, str]] field to Ollama class
  • Pass headers to __init__ and through to super().__init__()
  • Update client and async_client properties to pass headers when creating fallback clients

Usage

from llama_index.llms.ollama import Ollama

# Before: had to manually construct Client with headers
# After: just pass headers directly
llm = Ollama(
    model="llama3",
    base_url="https://my-ollama-server.com",
    headers={"Authorization": "Bearer MY_API_KEY"},
)

resp = llm.complete("Hello")  # headers automatically included

Backward Compatibility

  • headers defaults to None — no behavior change for existing users
  • Explicitly passed client / async_client still take full precedence
  • No new dependencies

Fixes #21086

Changed files

  • llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py (modified, +14/-2)
  • llama-index-integrations/llms/llama-index-llms-ollama/pyproject.toml (modified, +1/-1)

Code Example

from llama_index.llms.ollama import Ollama
from ollama import AsyncClient, Client

host=MY_CVUSTOM_HOST
headers={"Authorization": "Bearer MY_API_KEY}
model="MY_MODEL"

c = AsyncClient(host=host, headers=headers)
llm = Ollama(async_client=c, base_url=host, model=model)

resp = llm.complete("hi")
print(resp)

---

from ollama import Client
host=MY_CVUSTOM_HOST
headers={"Authorization": "Bearer MY_API_KEY}
model="MY_MODEL"
client = Client(host=host, headers=headers)
messages = [{ 'role': 'user',  'content': 'Why is the sky blue?', },]
for part in client.chat(model, messages=messages, stream=True):
    print(part.message.content, end='', flush=True)

---

This error is 401, because it is not sending the headers


Traceback (most recent call last):
  File "test.py", line 12, in <module>
    resp = llm.complete("hi")
           ^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 447, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 659, in complete
    return chat_to_completion_decorator(self.chat)(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/core/base/llms/generic_utils.py", line 184, in wrapper
    chat_response = func(messages, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 181, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 404, in chat
    options=self._model_kwargs,
            ^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 214, in _model_kwargs
    "num_ctx": self.get_context_window(),
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 224, in get_context_window
    info = self.client.show(self.model).modelinfo
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 637, in show
    return self._request(
           ^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 190, in _request
    return cls(**self._request_raw(*args, **kwargs).json())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 134, in _request_raw
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: Unauthorized (status code: 401)
RAW_BUFFERClick to expand / collapse

Bug Description

I was experimenting with llama_index, and decided to use Ollama, but my network Ollama instance has a api_key (Authorization headers with Bearer).

It seems that Ollama in Llama-Index not respect **kwargs passed or it is no using Client sent at all. Also I need to send twice the host.

Version

llama-index-llms-ollama==0.10.0

Steps to Reproduce

from llama_index.llms.ollama import Ollama
from ollama import AsyncClient, Client

host=MY_CVUSTOM_HOST
headers={"Authorization": "Bearer MY_API_KEY}
model="MY_MODEL"

c = AsyncClient(host=host, headers=headers)
llm = Ollama(async_client=c, base_url=host, model=model)

resp = llm.complete("hi")
print(resp)

If i use directly de Ollama python client it works ok:

from ollama import Client
host=MY_CVUSTOM_HOST
headers={"Authorization": "Bearer MY_API_KEY}
model="MY_MODEL"
client = Client(host=host, headers=headers)
messages = [{ 'role': 'user',  'content': 'Why is the sky blue?', },]
for part in client.chat(model, messages=messages, stream=True):
    print(part.message.content, end='', flush=True)

Relevant Logs/Tracbacks

This error is 401, because it is not sending the headers


Traceback (most recent call last):
  File "test.py", line 12, in <module>
    resp = llm.complete("hi")
           ^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 447, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 659, in complete
    return chat_to_completion_decorator(self.chat)(prompt, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/core/base/llms/generic_utils.py", line 184, in wrapper
    chat_response = func(messages, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 181, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 404, in chat
    options=self._model_kwargs,
            ^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 214, in _model_kwargs
    "num_ctx": self.get_context_window(),
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 224, in get_context_window
    info = self.client.show(self.model).modelinfo
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 637, in show
    return self._request(
           ^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 190, in _request
    return cls(**self._request_raw(*args, **kwargs).json())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ollama/_client.py", line 134, in _request_raw
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: Unauthorized (status code: 401)

extent analysis

Fix Plan

To fix the issue, you need to pass the headers to the Ollama instance. However, it seems like the Ollama class does not directly accept headers as a parameter.

Instead, you can pass the async_client with the headers set. Here's how you can modify your code:

from llama_index.llms.ollama import Ollama
from ollama import AsyncClient

host = 'MY_CUSTOM_HOST'
headers = {"Authorization": "Bearer MY_API_KEY"}
model = "MY_MODEL"

c = AsyncClient(host=host, headers=headers)
llm = Ollama(async_client=c, base_url=host, model=model)

resp = llm.complete("hi")
print(resp)

However, since the Ollama class is not using the async_client correctly, we need to modify the Ollama class itself to use the async_client.

Here's an example of how you can modify the Ollama class:

from llama_index.llms.ollama import Ollama
from ollama import AsyncClient

class CustomOllama(Ollama):
    def __init__(self, async_client, base_url, model):
        self.async_client = async_client
        self.base_url = base_url
        self.model = model

    def _request(self, method, path, **kwargs):
        return self.async_client._request_raw(method, path, **kwargs)

# Usage
host = 'MY_CUSTOM_HOST'
headers = {"Authorization": "Bearer MY_API_KEY"}
model = "MY_MODEL"

c = AsyncClient(host=host, headers=headers)
llm = CustomOllama(async_client=c, base_url=host, model=model)

resp = llm.complete("hi")
print(resp)

Verification

To verify that the fix worked, you can check the response status code. If the status code is 200, it means the request was successful.

resp = llm.complete("hi")
print(resp.status_code)  # Should print 200

Extra Tips

Make sure to handle any exceptions that may occur during the request. You can do this by wrapping the request in a try-except block.

try:
    resp = llm.complete("hi")
    print(resp)
except Exception as e:
    print(f"An error occurred: {e}")

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING