llamaIndex - ✅(Solved) Fix [Bug]: run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20802Fetched 2026-04-08 00:30:52
View on GitHub
Comments
1
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #20803: fix run stream llm request using ov genai will report std::runtime_error(ISSUE 20802)

Description (problem / solution / changelog)

Description

ov genai 2026.0.0.0 removed deprecated APIs like Streamer put method, bool return for callback, ChunkStreamer

Add write function which is used in stream request

Fixes # (issue)

root@3957c43f1690:/home/user# python3 
Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM
>>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",)
>>> ov_llm.config.max_new_tokens = 100
>>> response = ov_llm.stream_complete("What is the meaning of life?")
/home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value=<openvino_genai.py_openvi...bject at 0x7b78406f0bb0>, input_type=GenerationConfig])
  data = handler(self)
>>> terminate called after throwing an instance of 'std::runtime_error'
  what():  Tried to call pure virtual function "StreamerBase::write"
Aborted (core dumped)

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Changed files

  • llama-index-integrations/llms/llama-index-llms-openvino-genai/llama_index/llms/openvino_genai/base.py (modified, +46/-0)
  • llama-index-integrations/llms/llama-index-llms-openvino-genai/pyproject.toml (modified, +2/-1)
  • llama-index-integrations/llms/llama-index-llms-openvino-genai/tests/__init__.py (added, +0/-0)
  • llama-index-integrations/llms/llama-index-llms-openvino-genai/tests/test_llm_openvino_genai.py (added, +24/-0)

Code Example

root@3957c43f1690:/home/user# python3 
Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM
>>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",)
>>> ov_llm.config.max_new_tokens = 100
>>> response = ov_llm.stream_complete("What is the meaning of life?")
/home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value=<openvino_genai.py_openvi...bject at 0x7b78406f0bb0>, input_type=GenerationConfig])
  data = handler(self)
>>> terminate called after throwing an instance of 'std::runtime_error'
  what():  Tried to call pure virtual function "StreamerBase::write"
Aborted (core dumped)

---

ENV:
llama-index-embeddings-openvino           0.6.1
llama-index-embeddings-openvino-genai     0.6.1
llama-index-llms-openvino                 0.5.1
llama-index-llms-openvino-genai           0.2.1
llama-index-postprocessor-openvino-rerank 0.5.1
openvino                                  2026.0.0
openvino-genai                            2026.0.0.0
openvino-telemetry                        2025.2.0
openvino-tokenizers                       2026.0.0.0
llama-cloud                               0.1.35
llama-cloud-services                      0.6.54
llama-index                               0.14.13
llama-index-cli                           0.5.3
llama-index-core                          0.14.15
RAW_BUFFERClick to expand / collapse

Bug Description

Run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 ov genai 2026.0.0.0 removed deprecated APIs like Streamer put method, bool return for callback, ChunkStreamer ChunkStreamer put function is used in stream llm request Related commit in ov genai: https://github.com/openvinotoolkit/openvino.genai/commit/9d3d6e364c03d9bebc2d6a8df68468f75d95f560

Version

0.14.13

Steps to Reproduce

root@3957c43f1690:/home/user# python3 
Python 3.11.14 (main, Feb 24 2026, 19:44:43) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_index.llms.openvino_genai import OpenVINOGenAILLM
>>> ov_llm = OpenVINOGenAILLM(model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",device="CPU",)
>>> ov_llm.config.max_new_tokens = 100
>>> response = ov_llm.stream_complete("What is the meaning of life?")
/home/user/.local/lib/python3.11/site-packages/llama_index/core/schema.py:116: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='config', input_value=<openvino_genai.py_openvi...bject at 0x7b78406f0bb0>, input_type=GenerationConfig])
  data = handler(self)
>>> terminate called after throwing an instance of 'std::runtime_error'
  what():  Tried to call pure virtual function "StreamerBase::write"
Aborted (core dumped)

Relevant Logs/Tracbacks

ENV:
llama-index-embeddings-openvino           0.6.1
llama-index-embeddings-openvino-genai     0.6.1
llama-index-llms-openvino                 0.5.1
llama-index-llms-openvino-genai           0.2.1
llama-index-postprocessor-openvino-rerank 0.5.1
openvino                                  2026.0.0
openvino-genai                            2026.0.0.0
openvino-telemetry                        2025.2.0
openvino-tokenizers                       2026.0.0.0
llama-cloud                               0.1.35
llama-cloud-services                      0.6.54
llama-index                               0.14.13
llama-index-cli                           0.5.3
llama-index-core                          0.14.15

extent analysis

Fix: Update to the new streaming API (replace the removed ChunkStreamer/put usage).

The crash occurs because openvino‑genai 2026.0.0.0 dropped the old ChunkStreamer interface that llama‑index‑llms‑openvino‑genai 0.2.1 still calls. Bring the library up to a version that uses the new Streamer class (or patch it manually) and call the new stream method.


1️⃣ Upgrade the llama‑index OpenVINO‑GenAI package

pip install -U "llama-index-llms-openvino-genai>=0.2.2"
# also keep the rest of the llama‑index stack in sync
pip install -U "llama-index>=0.15.0"

0.2.2 (and later) were released after the OpenVINO‑GenAI 2026.0.0.0 change and already use the new Streamer API.


2️⃣ Adjust your code to the new API

from llama_index.llms.openvino_genai import OpenVINOGenAILLM

# Enable streaming (the flag is optional in newer releases)
llm = OpenVINOGenAILLM(
    model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",
    device="CPU",
    streaming=True,          # <-- new argument
)

llm.config.max_new_tokens = 100

# New streaming call – returns an iterator of tokens / strings
for token in llm.stream("What is the meaning of life?"):
    print(token, end="", flush=True)
print()   # final newline
  • stream_complete is now a thin wrapper that internally calls stream. Using stream directly avoids the deprecated path.

  • If you still need a single‑string response, collect the iterator:

response = "".join(llm.stream("What is the meaning of life?"))
print(response)

3️⃣ (Optional) Quick monkey‑patch for projects that cannot upgrade yet

# place this before importing OpenVINOGenAILLM
from openvino_genai import GenerationConfig, StreamerBase

class SimpleStreamer(StreamerBase):
    def __init__(self):
        super().__init__()
        self._chunks = []

    def write(self, chunk):
        self._chunks.append(chunk)

    def get_text(self):
        return "".join(self._chunks)

# Patch the internal call used by the old code path
import llama_index.llms.openvino_genai as og
og.ChunkStreamer = SimpleStreamer   # noqa: E305

This supplies a concrete implementation of the now‑pure‑virtual write method, preventing the runtime error. Use only as a temporary stop‑gap.


4️⃣ Verify the fix

from llama_index.llms.openvino_genai import OpenVINOGenAILLM

llm = OpenVINOGenAILLM(
    model_path="/home/user/models/Qwen3-8B/INT4_compressed_weights",
    device="CPU",
    streaming=True,
)

for token in llm.stream("What is the meaning of life?"):
    print(token, end="")   # should print tokens without crashing
print("\n✅ streaming works")

Run the script; it should finish cleanly and output the generated text. No std::runtime_error should appear.


Extra Tips

  • Keep the whole llama-index stack aligned (major version bumps often require matching sub‑packages).
  • Pin openvino-genai==2026.0.0.0 together with llama-index-llms-openvino-genai>=0.2.2 in your requirements.txt.
  • When a new OpenVINO‑GenAI release drops more APIs, check the release notes and update the llama‑index adapters accordingly.

That’s all you need to get streaming LLM calls working again.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: run stream llm request will report std::runtime_error using ov genai 2026.0.0.0 [1 pull requests, 1 comments, 1 participants]