llamaIndex - ✅(Solved) Fix [Bug]: run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 [1 pull requests, 1 comments, 1 participants]

jilongW · 2026-02-26T07:03:08Z

[llamaIndex] PR 20803: fix run stream llm request using ov genai will report std::runtime error ISSUE 20802 - Repository: run-llama/llama index - Author: jilon… # PR #20803: fix run stream llm request using ov genai will report std::runtime_error(ISSUE 20802) - Repository: run-llama/llama_index - Author: jilongW - State: open | merged: False - Link: https://github.com/run-llama/llama_index/pull/20803 ## Description (problem / solution / changelog) # Description ov genai 2026.0.0.0 removed deprecated APIs like Streamer put method, bool return for callback, ChunkStreamer Add write function which is used in stream request Fixes # [(issue)](https://github.com/run-llama/llama_index/issues/20802) ``` root@3957c43f1690:/home/user# python3 Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM >>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",) >>> ov_llm.config.max_new_tokens = 100 >>> response = ov_llm.stream_complete("What is the meaning of life?") /home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings: PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value= , input_type=GenerationConfig]) data = handler(self) >>> terminate called after throwing an instance of 'std::runtime_error' what(): Tried to call pure virtual function "StreamerBase::write" Aborted (core dumped) ``` Did I fill in the `tool.llamahub` section in the `pyproject.toml` and provide a detailed README.md for my new integration or package? - [x] Yes - [ ] No ## Version Bump? Did I bump the version in the `pyproject.toml` file of the package I am updating? (Except for the `llama-index-core` package) - [x] Yes - [ ] No ## Type of Change Please delete options that are not relevant. - [x] Bug fix (non-breaking change which fixes an issue) ## How Has This Been Tested? Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing. - [x] I added new unit tests to cover this change - [ ] I believe this change is already covered by existing unit tests ## Suggested Checklist: - [ ] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have made corresponding changes to the documentation - [ ] I have added Google Colab support for the newly added notebooks. - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] I ran `uv run make format; uv run make lint` to appease the lint gods ## Changed files - `llama-index-integrations/llms/llama-index-llms-openvino-genai/llama_index/llms/openvino_genai/base.py` (modified, +46/-0) - `llama-index-integrations/llms/llama-index-llms-openvino-genai/pyproject.toml` (modified, +2/-1) - `llama-index-integrations/llms/llama-index-llms-openvino-genai/tests/__init__.py` (added, +0/-0) - `llama-index-integrations/llms/llama-index-llms-openvino-genai/tests/test_llm_openvino_genai.py` (added, +24/-0) ## Fixed - Fixed by PR: fix run stream llm request using ov genai will report std::runtime_error(ISSUE 20802) (https://github.com/run-llama/llama_index/pull/20803) ### Bug Description Run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 ov genai 2026.0.0.0 removed deprecated APIs like Streamer put method, bool return for callback, ChunkStreamer ChunkStreamer put function is used in stream llm request Related commit in ov genai: https://github.com/openvinotoolkit/openvino.genai/commit/9d3d6e364c03d9bebc2d6a8df68468f75d95f560 ### Version 0.14.13 ### Steps to Reproduce ``` root@3957c43f1690:/home/user# python3 Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM >>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",) >>> ov_llm.config.max_new_tokens = 100 >>> response = ov_llm.stream_complete("What is the meaning of life?") /home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings: PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value= , input_type=GenerationConfig]) data = handler(self) >>> terminate called after throwing an instance of 'std::runtime_error' what(): Tried to call pure virtual function "StreamerBase::write" Aborted (core dumped) ``` ### Relevant Logs/Tracbacks ```shell ENV: llama-i

llamaIndex2026-02-26 07:03:08

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#20802•Fetched 2026-04-08 00:30:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jilongW

Participants

jilongW

Timeline (top)

labeled ×2commented ×1cross-referenced ×1

Fix Action

Fixed

Fixed by PR: fix run stream llm request using ov genai will report std::runtime_error(ISSUE 20802) (https://github.com/run-llama/llama_index/pull/20803)

PR fix notes

PR #20803: fix run stream llm request using ov genai will report std::runtime_error(ISSUE 20802)

Repository: run-llama/llama_index
Author: jilongW
State: open | merged: False
Link: https://github.com/run-llama/llama_index/pull/20803

Description (problem / solution / changelog)

Description

ov genai 2026.0.0.0 removed deprecated APIs like Streamer put method, bool return for callback, ChunkStreamer

Add write function which is used in stream request

Fixes # (issue)

root@3957c43f1690:/home/user# python3 
Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM
>>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",)
>>> ov_llm.config.max_new_tokens = 100
>>> response = ov_llm.stream_complete("What is the meaning of life?")
/home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value=<openvino_genai.py_openvi...bject at 0x7b78406f0bb0>, input_type=GenerationConfig])
  data = handler(self)
>>> terminate called after throwing an instance of 'std::runtime_error'
  what():  Tried to call pure virtual function "StreamerBase::write"
Aborted (core dumped)

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

Changed files

llama-index-integrations/llms/llama-index-llms-openvino-genai/llama_index/llms/openvino_genai/base.py (modified, +46/-0)
llama-index-integrations/llms/llama-index-llms-openvino-genai/pyproject.toml (modified, +2/-1)
llama-index-integrations/llms/llama-index-llms-openvino-genai/tests/__init__.py (added, +0/-0)
llama-index-integrations/llms/llama-index-llms-openvino-genai/tests/test_llm_openvino_genai.py (added, +24/-0)

Code Example

root@3957c43f1690:/home/user# python3 
Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM
>>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",)
>>> ov_llm.config.max_new_tokens = 100
>>> response = ov_llm.stream_complete("What is the meaning of life?")
/home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value=<openvino_genai.py_openvi...bject at 0x7b78406f0bb0>, input_type=GenerationConfig])
  data = handler(self)
>>> terminate called after throwing an instance of 'std::runtime_error'
  what():  Tried to call pure virtual function "StreamerBase::write"
Aborted (core dumped)

---

ENV:
llama-index-embeddings-openvino           0.6.1
llama-index-embeddings-openvino-genai     0.6.1
llama-index-llms-openvino                 0.5.1
llama-index-llms-openvino-genai           0.2.1
llama-index-postprocessor-openvino-rerank 0.5.1
openvino                                  2026.0.0
openvino-genai                            2026.0.0.0
openvino-telemetry                        2025.2.0
openvino-tokenizers                       2026.0.0.0
llama-cloud                               0.1.35
llama-cloud-services                      0.6.54
llama-index                               0.14.13
llama-index-cli                           0.5.3
llama-index-core                          0.14.15

RAW_BUFFERClick to expand / collapse

Bug Description

Run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 ov genai 2026.0.0.0 removed deprecated APIs like Streamer put method, bool return for callback, ChunkStreamer ChunkStreamer put function is used in stream llm request Related commit in ov genai: https://github.com/openvinotoolkit/openvino.genai/commit/9d3d6e364c03d9bebc2d6a8df68468f75d95f560

Version

0.14.13

Steps to Reproduce

root@3957c43f1690:/home/user# python3 
Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM
>>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",)
>>> ov_llm.config.max_new_tokens = 100
>>> response = ov_llm.stream_complete("What is the meaning of life?")
/home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value=<openvino_genai.py_openvi...bject at 0x7b78406f0bb0>, input_type=GenerationConfig])
  data = handler(self)
>>> terminate called after throwing an instance of 'std::runtime_error'
  what():  Tried to call pure virtual function "StreamerBase::write"
Aborted (core dumped)

Relevant Logs/Tracbacks

ENV:
llama-index-embeddings-openvino           0.6.1
llama-index-embeddings-openvino-genai     0.6.1
llama-index-llms-openvino                 0.5.1
llama-index-llms-openvino-genai           0.2.1
llama-index-postprocessor-openvino-rerank 0.5.1
openvino                                  2026.0.0
openvino-genai                            2026.0.0.0
openvino-telemetry                        2025.2.0
openvino-tokenizers                       2026.0.0.0
llama-cloud                               0.1.35
llama-cloud-services                      0.6.54
llama-index                               0.14.13
llama-index-cli                           0.5.3
llama-index-core                          0.14.15

extent analysis

Fix: Update to the new streaming API (replace the removed ChunkStreamer/put usage).

The crash occurs because openvino‑genai 2026.0.0.0 dropped the old ChunkStreamer interface that llama‑index‑llms‑openvino‑genai 0.2.1 still calls. Bring the library up to a version that uses the new Streamer class (or patch it manually) and call the new stream method.

1️⃣ Upgrade the llama‑index OpenVINO‑GenAI package

pip install -U "llama-index-llms-openvino-genai>=0.2.2"
# also keep the rest of the llama‑index stack in sync
pip install -U "llama-index>=0.15.0"

0.2.2 (and later) were released after the OpenVINO‑GenAI 2026.0.0.0 change and already use the new Streamer API.

2️⃣ Adjust your code to the new API

from llama_index.llms.openvino_genai import OpenVINOGenAILLM

# Enable streaming (the flag is optional in newer releases)
llm = OpenVINOGenAILLM(
    model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",
    device="CPU",
    streaming=True,          # <-- new argument
)

llm.config.max_new_tokens = 100

# New streaming call – returns an iterator of tokens / strings
for token in llm.stream("What is the meaning of life?"):
    print(token, end="", flush=True)
print()   # final newline

stream_complete is now a thin wrapper that internally calls stream. Using stream directly avoids the deprecated path.
If you still need a single‑string response, collect the iterator:

response = "".join(llm.stream("What is the meaning of life?"))
print(response)

3️⃣ (Optional) Quick monkey‑patch for projects that cannot upgrade yet

# place this before importing OpenVINOGenAILLM
from openvino_genai import GenerationConfig, StreamerBase

class SimpleStreamer(StreamerBase):
    def __init__(self):
        super().__init__()
        self._chunks = []

    def write(self, chunk):
        self._chunks.append(chunk)

    def get_text(self):
        return "".join(self._chunks)

# Patch the internal call used by the old code path
import llama_index.llms.openvino_genai as og
og.ChunkStreamer = SimpleStreamer   # noqa: E305

This supplies a concrete implementation of the now‑pure‑virtual write method, preventing the runtime error. Use only as a temporary stop‑gap.

4️⃣ Verify the fix

from llama_index.llms.openvino_genai import OpenVINOGenAILLM

llm = OpenVINOGenAILLM(
    model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",
    device="CPU",
    streaming=True,
)

for token in llm.stream("What is the meaning of life?"):
    print(token, end="")   # should print tokens without crashing
print("\n✅ streaming works")

Run the script; it should finish cleanly and output the generated text. No std::runtime_error should appear.

Extra Tips

Keep the whole llama-index stack aligned (major version bumps often require matching sub‑packages).
Pin openvino-genai==2026.0.0.0 together with llama-index-llms-openvino-genai>=0.2.2 in your requirements.txt.
When a new OpenVINO‑GenAI release drops more APIs, check the release notes and update the llama‑index adapters accordingly.

That’s all you need to get streaming LLM calls working again.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

llamaIndex - ✅(Solved) Fix [Bug]: run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #20803: fix run stream llm request using ov genai will report std::runtime_error(ISSUE 20802)

Description (problem / solution / changelog)

Description

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Changed files

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

1️⃣ Upgrade the llama‑index OpenVINO‑GenAI package

2️⃣ Adjust your code to the new API

3️⃣ (Optional) Quick monkey‑patch for projects that cannot upgrade yet

4️⃣ Verify the fix

Extra Tips

Still need to ship something?

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #20803: fix run stream llm request using ov genai will report std::runtime_error(ISSUE 20802)

Description (problem / solution / changelog)

Description

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Changed files

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

1️⃣ Upgrade the llama‑index OpenVINO‑GenAI package

2️⃣ Adjust your code to the new API

3️⃣ (Optional) Quick monkey‑patch for projects that cannot upgrade yet

4️⃣ Verify the fix

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING