autogen - ✅(Solved) Fix extra_body in OpenAIChatCompletionClient config is silently ignored when loaded via AutoGen Studio JSON [1 pull requests, 2 comments, 2 participants]

autogen2026-03-17 19:23:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

microsoft/autogen#7418•Fetched 2026-04-08 00:54:22

View on GitHub

Comments

Participants

Timeline

Reactions

Author

danmaxis

Participants

danmaxis

weiguangli-io

Timeline (top)

referenced ×3commented ×1cross-referenced ×1issue_type_added ×1

Error Message

openai.BadRequestError: Error code: 400 - { 'error': { 'code': 400, 'message': 'Assistant response prefill is incompatible with enable_thinking.', 'type': 'invalid_request_error' } }

Fix Action

Fix / Workaround

A workaround is to disable enable_thinking at the inference server level directly, or to insert <think>\n\n</think> at the start of the agent's system_message to trick the model's chat template into skipping the thinking block. Neither is ideal.

PR fix notes

PR #7421: Fix extra_body silently dropped during load_component()

Repository: microsoft/autogen
Author: joaquinhuigomez
State: closed | merged: False
Link: https://github.com/microsoft/autogen/pull/7421

Description (problem / solution / changelog)

Summary

Fixes #7418 — extra_body is silently dropped when loading OpenAI model client config via load_component().

The CreateArguments TypedDict and CreateArgumentsConfigModel Pydantic model both lacked an extra_body field, so any provider-specific parameters (e.g. extra_body: {"reasoning": {"effort": "high"}} for o3) were stripped during config validation.

This adds extra_body: Optional[Dict[str, Any]] to both models and threads it through to the API call.

Changed files

python/packages/autogen-ext/src/autogen_ext/models/openai/config/__init__.py (modified, +6/-1)
python/packages/autogen-ext/tests/models/test_openai_model_client.py (modified, +47/-0)

Code Example

{
  "provider": "autogen_ext.models.openai.OpenAIChatCompletionClient",
  "component_type": "model",
  "version": 1,
  "component_version": 1,
  "label": "OpenAIChatCompletionClient",
  "config": {
    "model": "Qwen3-30B-A3B",
    "api_key": "placeholder",
    "base_url": "http://localhost:8080/v1",
    "extra_body": {
      "enable_thinking": false
    },
    "model_info": {
      "vision": false,
      "function_calling": true,
      "json_output": true,
      "structured_output": true,
      "family": "unknown",
      "context_window": 32768
    }
  }
}

---

openai.BadRequestError: Error code: 400 - {
  'error': {
    'code': 400,
    'message': 'Assistant response prefill is incompatible with enable_thinking.',
    'type': 'invalid_request_error'
  }
}

---

File ".../autogen_agentchat/agents/_assistant_agent.py", line 955, in _call_llm
    model_result = await model_client.create(
File ".../autogen_ext/models/openai/_openai_client.py", line 624, in create
    result = await future
File ".../openai/resources/chat/completions/completions.py", line 2714, in create
    return await self._post(
...
openai.BadRequestError: Error code: 400 - {'error': {'code': 400,
  'message': 'Assistant response prefill is incompatible with enable_thinking.',
  'type': 'invalid_request_error'}}

---

# This works fine in Python — extra_body is respected
client = OpenAIChatCompletionClient(
    model="Qwen3-30B-A3B",
    base_url="http://localhost:8080/v1",
    api_key="placeholder",
    extra_body={"enable_thinking": False},
    model_info=ModelInfo(...)
)

RAW_BUFFERClick to expand / collapse

What happened?

`extra_body` in `OpenAIChatCompletionClient` config is silently ignored when loaded via AutoGen Studio JSON

Describe the bug

When configuring a model client through AutoGen Studio's JSON editor with an extra_body field (e.g., to pass enable_thinking: false to a Qwen3-compatible endpoint), the field appears to be silently dropped during deserialization. The extra parameters never reach the underlying HTTP request, even though the same configuration works correctly when instantiated directly via Python.

I'm not 100% sure if this is a serialization issue in the Studio layer or a limitation in how OpenAIChatCompletionClient handles extra_body during load_component() — happy to be corrected if I'm misunderstanding the intended behavior.

To Reproduce

Spin up AutoGen Studio with a local OpenAI-compatible endpoint that serves a Qwen3 model with enable_thinking active by default (e.g., LM Studio, llama-server, vLLM).
Configure a model client via the Studio JSON editor with extra_body:

{
  "provider": "autogen_ext.models.openai.OpenAIChatCompletionClient",
  "component_type": "model",
  "version": 1,
  "component_version": 1,
  "label": "OpenAIChatCompletionClient",
  "config": {
    "model": "Qwen3-30B-A3B",
    "api_key": "placeholder",
    "base_url": "http://localhost:8080/v1",
    "extra_body": {
      "enable_thinking": false
    },
    "model_info": {
      "vision": false,
      "function_calling": true,
      "json_output": true,
      "structured_output": true,
      "family": "unknown",
      "context_window": 32768
    }
  }
}

Assign this model client to an AssistantAgent inside a RoundRobinGroupChat team.
Run any task in the Playground.

Observed error (from docker logs):

openai.BadRequestError: Error code: 400 - {
  'error': {
    'code': 400,
    'message': 'Assistant response prefill is incompatible with enable_thinking.',
    'type': 'invalid_request_error'
  }
}

The error confirms the endpoint is still receiving requests with enable_thinking active, meaning extra_body: { enable_thinking: false } was never forwarded.

Full traceback:

File ".../autogen_agentchat/agents/_assistant_agent.py", line 955, in _call_llm
    model_result = await model_client.create(
File ".../autogen_ext/models/openai/_openai_client.py", line 624, in create
    result = await future
File ".../openai/resources/chat/completions/completions.py", line 2714, in create
    return await self._post(
...
openai.BadRequestError: Error code: 400 - {'error': {'code': 400,
  'message': 'Assistant response prefill is incompatible with enable_thinking.',
  'type': 'invalid_request_error'}}

Expected behavior

The extra_body field should be forwarded as-is to the underlying openai client's create() / create_stream() calls, exactly as it would be when constructing OpenAIChatCompletionClient directly in Python:

# This works fine in Python — extra_body is respected
client = OpenAIChatCompletionClient(
    model="Qwen3-30B-A3B",
    base_url="http://localhost:8080/v1",
    api_key="placeholder",
    extra_body={"enable_thinking": False},
    model_info=ModelInfo(...)
)

The same behavior should be achievable via the Studio JSON config, since the field is supported by the underlying client.

Environment

autogenstudio version: latest via pip install autogenstudio (as of March 2026)
autogen-agentchat / autogen-ext: installed as dependencies
Running inside Docker (python:3.11-slim base image)
Local model server: LM Studio / llama-server (OpenAI-compatible endpoint)
Model: Qwen3 family (any variant with enable_thinking support)

Additional context

This affects any use case requiring vendor-specific parameters that go beyond the standard OpenAI API spec — enable_thinking for Qwen3 being a common one, but the same issue would surface with other extra_body fields used by vLLM, TabbyAPI, or similar servers.

If extra_body deserialization is intentionally unsupported in the component config, it would be helpful to document this limitation and suggest the recommended alternative.

Thanks for the great project — really appreciate the work going into AutoGen Studio!

Which packages was the bug in?

AutoGen Studio (autogensudio)

AutoGen library version.

Python dev (main branch)

Other library version.

No response

Model used

Qwen3.5-35B-A3B-Q4_K_M

Model provider

LlamaCpp

Other model provider

No response

Python version

3.11

.NET version

None

Operating system

Other

extent analysis

Fix Plan

To fix the issue of extra_body being silently ignored when loading OpenAIChatCompletionClient via AutoGen Studio JSON, we need to modify the deserialization process in the OpenAIChatCompletionClient component.

Here are the steps:

Update the OpenAIChatCompletionClient component to properly handle the extra_body field during deserialization.
Ensure that the extra_body field is correctly forwarded to the underlying openai client's create() / create_stream() calls.

Code Changes

# In OpenAIChatCompletionClient component
import json

class OpenAIChatCompletionClient:
    # ...

    def __init__(self, config):
        # ...
        self.extra_body = config.get('extra_body', {})

    def create(self, **kwargs):
        # ...
        if self.extra_body:
            kwargs.update(self.extra_body)
        # ...

    def create_stream(self, **kwargs):
        # ...
        if self.extra_body:
            kwargs.update(self.extra_body)
        # ...

Verification

To verify that the fix worked, you can:

Configure the OpenAIChatCompletionClient via the Studio JSON editor with the extra_body field.
Run a task in the Playground and check the docker logs for the openai client requests.
Verify that the extra_body field is correctly forwarded to the underlying openai client's create() / create_stream() calls.

Extra Tips

Make sure to update the documentation to reflect the correct usage of the extra_body field in the OpenAIChatCompletionClient component.
Consider adding additional logging or debugging statements to help diagnose any future issues with the extra_body field.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.