llamaIndex - ✅(Solved) Fix [Bug]: Bedrock Converse streaming produces string `tool_kwargs` in `ToolCallBlock` instead of dict [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21579Fetched 2026-05-07 03:31:26
View on GitHub
Comments
3
Participants
3
Timeline
11
Reactions
0
Timeline (top)
commented ×3subscribed ×3labeled ×2mentioned ×2

Error Message

pydantic_core.ValidationError: 1 validation error for FunctionCall args Input should be a valid dictionary [type=dict_type, input_value='{"document_id": "..."}', input_type=str]

Root Cause

The AWS Bedrock Converse streaming API (ConverseStream) delivers tool use input via ToolUseBlockDelta.input as a string type. The adapter correctly concatenates these partial string chunks:

# base.py, stream_chat / astream_chat
current_tool_call["input"] += tool_use_delta["input"]

But when constructing ToolCallBlock, it passes the accumulated string directly without parsing:

ToolCallBlock(
    tool_kwargs=tool_call.get("input", {}),  # ← still a JSON string, not a dict
    tool_name=tool_call.get("name", ""),
    tool_call_id=tool_call.get("toolUseId"),
)

This pattern appears at 6 locations in base.py (lines 593, 649, 693, 877, 933, 978 in v0.14.9), all within the streaming code paths.

Fix Action

Fixed

PR fix notes

PR #21580: fix(bedrock-converse): parse streaming tool_kwargs from string to dict

Description (problem / solution / changelog)

Description

Fixes #21579

Bedrock ConverseStream delivers tool use input as partial JSON string chunks via ToolUseBlockDelta.input. The streaming code (stream_chat / astream_chat) correctly concatenates these chunks into a complete JSON string, but passes the raw string directly to ToolCallBlock.tool_kwargs instead of parsing it into a dict.

This breaks cross-provider workflows where chat history from a Bedrock streaming session is replayed through another provider (e.g., Google GenAI), which expects tool_kwargs to be a dict:

pydantic_core.ValidationError: 1 validation error for FunctionCall
args
  Input should be a valid dictionary [type=dict_type, input_value='{"document_id": "..."}', input_type=str]

The non-streaming path works fine because boto3 automatically deserializes ToolUseBlock.input to a Python dict.

Changes

  • Added _parse_tool_input() helper function that safely parses JSON strings to dicts (with fallback to {} for incomplete/malformed JSON during intermediate streaming yields)
  • Applied the helper at all 6 ToolCallBlock construction sites in stream_chat (3 sites) and astream_chat (3 sites)
  • Added 2 new unit tests (test_stream_chat_tool_kwargs_parsed_as_dict and test_astream_chat_tool_kwargs_parsed_as_dict) that mock the full Bedrock streaming tool use event sequence and assert tool_kwargs is a dict

Why this approach

The @dosu analysis suggested parsing at the contentBlockStop event. However, the current code has no contentBlockStop handler, and intermediate contentBlockDelta yields would still emit raw strings before that event fires.

Instead, wrapping at each construction site:

  • Normalizes every yield (intermediate and final)
  • Requires no new event handlers or structural changes
  • Matches the existing defensive pattern in get_tool_calls_from_response() (which already does isinstance(tool_call.tool_kwargs, str)parse_partial_json)
  • Minimal diff — easy to review

Test results

83 passed, 19 skipped (AWS integration tests)

All existing tests pass with zero regressions.

Changed files

  • llama-index-integrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py (modified, +22/-6)
  • llama-index-integrations/llms/llama-index-llms-bedrock-converse/tests/test_llms_bedrock_converse.py (modified, +179/-0)

Code Example

# base.py, stream_chat / astream_chat
current_tool_call["input"] += tool_use_delta["input"]

---

ToolCallBlock(
    tool_kwargs=tool_call.get("input", {}),  # ← still a JSON string, not a dict
    tool_name=tool_call.get("name", ""),
    tool_call_id=tool_call.get("toolUseId"),
)

---

if isinstance(tool_call.tool_kwargs, str):
    try:
        argument_dict = parse_partial_json(tool_call.tool_kwargs)
    ...

---

google.genai.types.FunctionCall(
    name=block.tool_name,
    args=cast(Dict[str, Any], block.tool_kwargs)  # ← Pydantic rejects strings
)

---

pydantic_core.ValidationError: 1 validation error for FunctionCall
args
  Input should be a valid dictionary [type=dict_type, input_value='{"document_id": "..."}', input_type=str]

---

import json

# Replace:
tool_kwargs=tool_call.get("input", {})

# With:
tool_kwargs=json.loads(tool_call["input"]) if isinstance(tool_call.get("input"), str) else tool_call.get("input", {})

---

google.genai.types.FunctionCall(
    name=block.tool_name,
    args=cast(Dict[str, Any], block.tool_kwargs)  # ← Pydantic rejects strings
)

---

pydantic_core.ValidationError: 1 validation error for FunctionCall
args
  Input should be a valid dictionary [type=dict_type, input_value='{"document_id": "..."}', input_type=str]

---
RAW_BUFFERClick to expand / collapse

Bug Description

The Bedrock Converse adapter's streaming methods (stream_chat and astream_chat) construct ToolCallBlock objects with tool_kwargs as a raw JSON string instead of a parsed dict. This breaks cross-provider workflows where chat history from a Bedrock streaming session is later consumed by another provider's adapter (e.g., Google GenAI), which expects tool_kwargs to be a dict.

Root Cause

The AWS Bedrock Converse streaming API (ConverseStream) delivers tool use input via ToolUseBlockDelta.input as a string type. The adapter correctly concatenates these partial string chunks:

# base.py, stream_chat / astream_chat
current_tool_call["input"] += tool_use_delta["input"]

But when constructing ToolCallBlock, it passes the accumulated string directly without parsing:

ToolCallBlock(
    tool_kwargs=tool_call.get("input", {}),  # ← still a JSON string, not a dict
    tool_name=tool_call.get("name", ""),
    tool_call_id=tool_call.get("toolUseId"),
)

This pattern appears at 6 locations in base.py (lines 593, 649, 693, 877, 933, 978 in v0.14.9), all within the streaming code paths.

Why the non-streaming path works

The non-streaming Converse API returns ToolUseBlock.input as a JSON value, which boto3 deserializes to a Python dict. So the non-streaming _get_content_and_tool_calls method at line ~430 works correctly — tool_usage.get("input", {}) is already a dict.

Internal inconsistency

The adapter's own get_tool_calls_from_response method already defensively handles string tool_kwargs:

if isinstance(tool_call.tool_kwargs, str):
    try:
        argument_dict = parse_partial_json(tool_call.tool_kwargs)
    ...

This suggests the developers were aware that strings could appear, but the fix was applied at the consumption site rather than at the source.

Downstream Impact

Adapters that receive ToolCallBlock with string tool_kwargs may fail. For example, the Google GenAI adapter (llama_index/llms/google_genai/utils.py) constructs:

google.genai.types.FunctionCall(
    name=block.tool_name,
    args=cast(Dict[str, Any], block.tool_kwargs)  # ← Pydantic rejects strings
)

This produces:

pydantic_core.ValidationError: 1 validation error for FunctionCall
args
  Input should be a valid dictionary [type=dict_type, input_value='{"document_id": "..."}', input_type=str]

Any workflow that persists chat history from a Bedrock streaming session and replays it through a different provider will hit this.

Suggested Fix

Parse the accumulated JSON string into a dict when constructing ToolCallBlock in the streaming code paths. A minimal fix at each of the 6 construction sites:

import json

# Replace:
tool_kwargs=tool_call.get("input", {})

# With:
tool_kwargs=json.loads(tool_call["input"]) if isinstance(tool_call.get("input"), str) else tool_call.get("input", {})

Alternatively, the parsing could happen once at the contentBlockStop event (when the tool use block is complete), converting current_tool_call["input"] from string to dict at that point before any ToolCallBlock is constructed with the final value.

Version

llama-index-llms-bedrock-converse 0.14.9 (also confirmed still present on main).

Steps to Reproduce

Adapters that receive ToolCallBlock with string tool_kwargs may fail. For example, the Google GenAI adapter (llama_index/llms/google_genai/utils.py) constructs:

google.genai.types.FunctionCall(
    name=block.tool_name,
    args=cast(Dict[str, Any], block.tool_kwargs)  # ← Pydantic rejects strings
)

This produces:

pydantic_core.ValidationError: 1 validation error for FunctionCall
args
  Input should be a valid dictionary [type=dict_type, input_value='{"document_id": "..."}', input_type=str]

Any workflow that persists chat history from a Bedrock streaming session and replays it through a different provider will hit this.

Relevant Logs/Tracebacks

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: Bedrock Converse streaming produces string `tool_kwargs` in `ToolCallBlock` instead of dict [1 pull requests, 3 comments, 3 participants]