openclaw - ✅(Solved) Fix [Feature]: /v1/responses drops every built-in tool call from `output`; add opt-in flag to surface them [2 pull requests, 4 comments, 4 participants]

glow1128 · 2026-04-30T12:20:28Z

[openclaw] PR 75075: feat gateway : surface built-in tool calls as function call output items on /v1/responses - Repository: openclaw/openclaw - Author: glow11… # PR #75075: feat(gateway): surface built-in tool calls as function_call output items on /v1/responses - Repository: openclaw/openclaw - Author: glow1128 - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/75075 ## Description (problem / solution / changelog) ## Summary - Adds opt-in `gateway.http.endpoints.responses.exposeBuiltInToolCalls` flag. - When enabled, `/v1/responses` appends a `function_call` output item for every built-in tool the agent invokes (`bash`, `edit`, `find`, `grep`, `ls`, `read`, `write`, `apply_patch`, plus bundled plugin tools), in both JSON and SSE responses, in arrival order at `output_index` 1+. - Default off; no PI runtime, plugin SDK, or client-tool path changes. Closes #75074. The issue has the user stories (eval pipelines, audit/replay, "what did the agent do?" UIs) and the before/after JSON. ## Note on the parallel PR #75107 @mrinalgaur2005 opened #75107 ~2 hours after this one with the same approach, and credited me in their CHANGELOG. Their version made several real improvements over my original — I've folded them in here in commit `75bed22698` so the two PRs are at quality parity. Specifically: a dedupe regression test that exercises the client-tool-name filter (which I had implemented but not tested), `status: "in_progress"` on streaming `output_item.added`, dropping events with missing `toolCallId` instead of synthesizing one, expanded JSDoc, `ReadonlySet` typing, and a structured semantics bullet list in the docs. Both authors are credited in the CHANGELOG. ## Why a refactor was in this PR before the polish commit The first commit on this branch opened a parallel `onAgentEvent` subscription just for tool capture. It worked but was ugly: two listeners per stream request, non-stream and stream output orders diverged. The second commit folds the capture into the streaming path's existing main listener and aligns both paths on the same output order (assistant at index 0, audit items 1+). ## Implementation - `src/config/{types.gateway,zod-schema,schema.base.generated}.ts`: adds `exposeBuiltInToolCalls?: boolean` to `GatewayHttpResponsesConfig`. The generated JSON-schema baseline is regenerated and committed. - `src/gateway/openresponses-http.ts`: - Pure helper `tryCaptureBuiltInToolCall({ evt, runId, clientToolNames })` returns a `CapturedBuiltInToolCall` or `null` for one event. No internal state, no listeners, no dispose. Drops events from a different run, non-tool streams, non-start phases, names that match a caller-provided client tool, and events with empty/missing `toolCallId`. - Streaming path: tool capture is an additional branch in the existing `unsubscribe = onAgentEvent(...)` listener. One subscription per request. Captures are pushed into `streamBuiltInItems` and emitted via `response.output_item.added` (`status: "in_progress"`) / `done` (`status: "completed"`) immediately, with a `nextStreamOutputIndex` counter so audit items, the optional client-tool `function_call`, and the assistant `message` never collide on `output_index`. - Non-streaming path: small inline subscription accumulates into a local array. Items are appended after the assistant message. - `src/gateway/openresponses-http.test.ts`: four new e2e tests — default-off does not surface, JSON output sequence is asserted as `[message, function_call, function_call]` with correct args, SSE emits `response.output_item.added` for every capture and the final `response.completed` includes them, and the dedupe test for client-tool-name conflicts. - `docs/gateway/openresponses-http-api.md`: documents the flag with a JSON5 config example and a structured semantics list. - `CHANGELOG.md`: entry under Unreleased / Changes, links #75074, credits both contributors. The PI runtime, the `clientTools` admission path, the `pendingToolCalls` delegate path, and `PI_RESERVED_TOOL_NAMES` are all untouched. ## Test plan - [x] `pnpm test src/gateway/openresponses-http.test.ts` — 19/19 passing (incl. 4 new tests; existing tests unchanged). - [x] `pnpm tsgo:core` — clean. - [x] `pnpm config:docs:check` — clean (baseline regenerated and committed). - [x] `pnpm exec oxfmt --check` — clean on touched files. - [x] `pnpm exec oxlint` (core profile) — 0 warnings, 0 errors on touched files. - [x] Rebased onto current `origin/main`; `mergeable: MERGEABLE`. ## Out of scope (deliberate) - Tool result capture — only `phase: "start"` is surfaced. Result capture is a different design conversation (size, redaction, schema-per-tool). - Streaming token-level deltas of tool args. - "Delegate mode" where built-ins become caller-executable. That lifts `PI_RESERVED_TOOL_NAMES` and rewires tool admission; it deserves its own issue. ## Changed files - `CHANGELOG.md` (modified, +1/-0) - `docs/.generated/config-baseline.sha256` (modified, +2/-2) - `docs/gateway/open

openclaw2026-04-30 12:20:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75074•Fetched 2026-05-01 05:38:24

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4referenced ×4cross-referenced ×2mentioned ×1

Root Cause

Default-on would change the response shape for every existing caller. That's a larger conversation; this issue is asking for the minimum change to make the use cases above possible at all. A boolean is sufficient because the only orthogonal behavior — "delegate built-ins to the caller for execution, like client tools" — is a different contract that lifts PI_RESERVED_TOOL_NAMES and rewires admission. That belongs in a separate issue once this one is in.

Fix Action

Fixed

Fixed by PR: feat(gateway): surface built-in tool calls as function_call output items on /v1/responses (https://github.com/openclaw/openclaw/pull/75075)
Fixed by PR: feat(gateway/responses): opt-in flag to surface built-in tool calls in /v1/responses (https://github.com/openclaw/openclaw/pull/75107)

PR fix notes

PR #75075: feat(gateway): surface built-in tool calls as function_call output items on /v1/responses

Repository: openclaw/openclaw
Author: glow1128
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/75075

Description (problem / solution / changelog)

Summary

Adds opt-in gateway.http.endpoints.responses.exposeBuiltInToolCalls flag.
When enabled, /v1/responses appends a function_call output item for every built-in tool the agent invokes (bash, edit, find, grep, ls, read, write, apply_patch, plus bundled plugin tools), in both JSON and SSE responses, in arrival order at output_index 1+.
Default off; no PI runtime, plugin SDK, or client-tool path changes.

Closes #75074. The issue has the user stories (eval pipelines, audit/replay, "what did the agent do?" UIs) and the before/after JSON.

Note on the parallel PR #75107

@mrinalgaur2005 opened #75107 ~2 hours after this one with the same approach, and credited me in their CHANGELOG. Their version made several real improvements over my original — I've folded them in here in commit 75bed22698 so the two PRs are at quality parity. Specifically: a dedupe regression test that exercises the client-tool-name filter (which I had implemented but not tested), status: "in_progress" on streaming output_item.added, dropping events with missing toolCallId instead of synthesizing one, expanded JSDoc, ReadonlySet typing, and a structured semantics bullet list in the docs. Both authors are credited in the CHANGELOG.

Why a refactor was in this PR before the polish commit

The first commit on this branch opened a parallel onAgentEvent subscription just for tool capture. It worked but was ugly: two listeners per stream request, non-stream and stream output orders diverged. The second commit folds the capture into the streaming path's existing main listener and aligns both paths on the same output order (assistant at index 0, audit items 1+).

Implementation

src/config/{types.gateway,zod-schema,schema.base.generated}.ts: adds exposeBuiltInToolCalls?: boolean to GatewayHttpResponsesConfig. The generated JSON-schema baseline is regenerated and committed.
src/gateway/openresponses-http.ts:
- Pure helper tryCaptureBuiltInToolCall({ evt, runId, clientToolNames }) returns a CapturedBuiltInToolCall or null for one event. No internal state, no listeners, no dispose. Drops events from a different run, non-tool streams, non-start phases, names that match a caller-provided client tool, and events with empty/missing toolCallId.
- Streaming path: tool capture is an additional branch in the existing unsubscribe = onAgentEvent(...) listener. One subscription per request. Captures are pushed into streamBuiltInItems and emitted via response.output_item.added (status: "in_progress") / done (status: "completed") immediately, with a nextStreamOutputIndex counter so audit items, the optional client-tool function_call, and the assistant message never collide on output_index.
- Non-streaming path: small inline subscription accumulates into a local array. Items are appended after the assistant message.
src/gateway/openresponses-http.test.ts: four new e2e tests — default-off does not surface, JSON output sequence is asserted as [message, function_call, function_call] with correct args, SSE emits response.output_item.added for every capture and the final response.completed includes them, and the dedupe test for client-tool-name conflicts.
docs/gateway/openresponses-http-api.md: documents the flag with a JSON5 config example and a structured semantics list.
CHANGELOG.md: entry under Unreleased / Changes, links #75074, credits both contributors.

The PI runtime, the clientTools admission path, the pendingToolCalls delegate path, and PI_RESERVED_TOOL_NAMES are all untouched.

Test plan

pnpm test src/gateway/openresponses-http.test.ts — 19/19 passing (incl. 4 new tests; existing tests unchanged).
pnpm tsgo:core — clean.
pnpm config:docs:check — clean (baseline regenerated and committed).
pnpm exec oxfmt --check — clean on touched files.
pnpm exec oxlint (core profile) — 0 warnings, 0 errors on touched files.
Rebased onto current origin/main; mergeable: MERGEABLE.

Out of scope (deliberate)

Tool result capture — only phase: "start" is surfaced. Result capture is a different design conversation (size, redaction, schema-per-tool).
Streaming token-level deltas of tool args.
"Delegate mode" where built-ins become caller-executable. That lifts PI_RESERVED_TOOL_NAMES and rewires tool admission; it deserves its own issue.

Changed files

CHANGELOG.md (modified, +1/-0)
docs/.generated/config-baseline.sha256 (modified, +2/-2)
docs/gateway/openresponses-http-api.md (modified, +46/-0)
src/config/schema.base.generated.ts (modified, +3/-0)
src/config/types.gateway.ts (modified, +10/-0)
src/config/zod-schema.ts (modified, +1/-0)
src/gateway/openresponses-http.test.ts (modified, +202/-0)
src/gateway/openresponses-http.ts (modified, +151/-4)

PR #75107: feat(gateway/responses): opt-in flag to surface built-in tool calls in /v1/responses

Repository: openclaw/openclaw
Author: mrinalgaur2005
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/75107

Description (problem / solution / changelog)

Closes #75074.

Adds an opt-in gateway.http.endpoints.responses.exposeBuiltInToolCalls flag. With it on, POST /v1/responses appends a function_call item to output for every built-in agent tool invocation (bash, read, grep, ...) in both JSON and SSE responses. Default off, so existing callers see byte-identical responses.

Behavior

Audit-only. Only the phase: "start" event is captured from the agent stream. Tool results aren''t surfaced and there''s no way to feed a function_call_output back for one of these — the agent already executed them.
Streaming. Each capture emits its own response.output_item.added and response.output_item.done at the next free output_index. The same items appear in the final response.completed.response.output array. Hardcoded output_index: 1 for the client-tool delegate is replaced with a nextStreamOutputIndex counter so audit items, the delegate function_call, and the assistant message never collide.
Dedupe. When a built-in tool name matches a tool the caller registered via tools: [...], the audit capture is skipped so the same call never shows up as both an audit and delegate item.
Default off. Existing callers see exactly the same response bytes they did before. Locked in by a regression test.

Implementation

src/config/types.gateway.ts + src/config/zod-schema.ts: new exposeBuiltInToolCalls?: boolean on GatewayHttpResponsesConfig.
src/config/schema.base.generated.ts + docs/.generated/config-baseline.sha256: regenerated via pnpm config:schema:gen + pnpm config:docs:gen. Only the expected three baseline hashes change (combined, core, plugin); channel hash is untouched.
src/gateway/openresponses-http.ts: pure tryCaptureBuiltInToolCall helper, audit branch in the streaming onAgentEvent listener, and an inline subscribe/finally-unsubscribe around the non-streaming runResponsesAgentCommand so we never leak listeners on early returns or throws.

Test plan

pnpm test src/gateway/openresponses-http.test.ts — 19/19 green (15 existing + 4 new):
- drops built-in tool calls from output by default (regression for #75074) — locks the byte-identical default-off shape.
- surfaces built-in tool calls as function_call output items when exposeBuiltInToolCalls is true — JSON, asserts [message, function_call, function_call] ordering and name/call_id/arguments round-trip.
- emits SSE output_item events for built-in tool calls at incrementing output_index — checks streaming events plus the final response.completed payload.
- does not emit a duplicate audit function_call when the tool name matches a caller-provided client tool — covers the dedupe path against pendingToolCalls.
pnpm tsgo:core — clean.
pnpm exec oxfmt --check and pnpm exec oxlint on touched files — clean.

Out of scope (deliberate, keeps the surface minimal)

Tool result capture (phase: "end") — needs a separate design pass for size, redaction, and per-tool result schemas.
Streaming token-level deltas of tool args.
A "delegate mode" that lifts PI_RESERVED_TOOL_NAMES so built-in tools become caller-executable.
Changing the default response shape — still byte-identical when the flag is absent or false.

Changed files

CHANGELOG.md (modified, +4/-0)
docs/.generated/config-baseline.sha256 (modified, +3/-3)
docs/gateway/openresponses-http-api.md (modified, +44/-0)
src/config/schema.base.generated.ts (modified, +4/-1)
src/config/types.gateway.ts (modified, +12/-0)
src/config/zod-schema.ts (modified, +1/-0)
src/gateway/openresponses-http.test.ts (modified, +269/-0)
src/gateway/openresponses-http.ts (modified, +182/-5)

Code Example

// POST /v1/responses, agent runs `bash`, then `read`, then replies
{
  "id": "resp_…",
  "status": "completed",
  "output": [
    { "type": "message", "role": "assistant",
      "content": [{ "type": "output_text", "text": "Found 3 TODOs in src/." }] }
  ]
}

---

{
  "id": "resp_…",
  "status": "completed",
  "output": [
    { "type": "message", "role": "assistant",
      "content": [{ "type": "output_text", "text": "Found 3 TODOs in src/." }] },
    { "type": "function_call", "id": "call_…", "call_id": "tc_1",
      "name": "bash", "arguments": "{\"command\":\"grep -rn TODO src/\"}",
      "status": "completed" },
    { "type": "function_call", "id": "call_…", "call_id": "tc_2",
      "name": "read", "arguments": "{\"path\":\"src/foo.ts\"}",
      "status": "completed" }
  ]
}

RAW_BUFFERClick to expand / collapse

Three concrete things you cannot do today

Run an offline eval against /v1/responses that scores tool use. Sample prompt: "find every TODO in the repo and write a summary." We want to grade the agent on whether it called grep, find, and read in a sensible order — without scraping logs out-of-band. Today the response is { output: [{ type: "message", … }] }. There's no way for the eval harness to know the agent used any tools at all, let alone which ones. The whole class of "did the agent reason and act, or did it hallucinate the answer?" evals is closed off.
Ship a "what did the agent do?" UI on top of OpenClaw HTTP. Any thin client (Lambda, serverless function, internal automation gateway, an MCP-like router, a desktop app speaking only HTTP) wants to render: "Agent ran bash 'git status', then read CHANGELOG.md, then replied with X." Today the only path is to open a second connection to the Gateway WebSocket event stream, authenticate it separately, correlate by runId, and reassemble the trace. For HTTP-only or stateless callers that's not feasible. Even where it is, it's a second auth surface, a second concurrency model, and not part of any documented contract — docs/gateway/openresponses-http-api.md makes no promise that those WS event names are stable.
Audit and replay agent runs for security/compliance review. "Did the agent ever shell out to bash 'curl …'? Did it read any path under ~/.ssh?" The answer must be derivable from the run's saved response, otherwise the trail is incomplete. Right now you have to keep a parallel Gateway WS log to answer that, and if the WS subscription dropped a beat, the response itself is no help.

These are not edge cases. They are the three highest-value reasons to put an agent behind an HTTP API in the first place.

Before / after

Before, with the agent doing real work:

// POST /v1/responses, agent runs `bash`, then `read`, then replies
{
  "id": "resp_…",
  "status": "completed",
  "output": [
    { "type": "message", "role": "assistant",
      "content": [{ "type": "output_text", "text": "Found 3 TODOs in src/." }] }
  ]
}

The two tool invocations are silently dropped from the response. The caller has no record they happened.

After, with gateway.http.endpoints.responses.exposeBuiltInToolCalls: true:

{
  "id": "resp_…",
  "status": "completed",
  "output": [
    { "type": "message", "role": "assistant",
      "content": [{ "type": "output_text", "text": "Found 3 TODOs in src/." }] },
    { "type": "function_call", "id": "call_…", "call_id": "tc_1",
      "name": "bash", "arguments": "{\"command\":\"grep -rn TODO src/\"}",
      "status": "completed" },
    { "type": "function_call", "id": "call_…", "call_id": "tc_2",
      "name": "read", "arguments": "{\"path\":\"src/foo.ts\"}",
      "status": "completed" }
  ]
}

Same flag in streaming mode: each invocation arrives as response.output_item.added / response.output_item.done SSE events at incrementing output_index while the run is happening.

Why this should land (and isn't a layering violation)

First-class output items are the entire point of the Responses surface. OpenAI's own Responses API returns function_call, web_search_call, code_interpreter_call, file_search_call, image_generation_call, etc. as items in the run's output array; that's the surface contract OpenClaw advertises compatibility with at https://www.open-responses.com/ . OpenClaw today returns only message items unless the caller registered a function tool. That's a strict subset of the contract.
The plumbing is already there. pi-embedded-runner emits stream: "tool", phase: "start", { name, toolCallId, args } events, scoped by runId, for every tool the agent runs (src/agents/pi-embedded-subscribe.handlers.tools.ts:630). The streaming path of /v1/responses already subscribes to onAgentEvent for lifecycle and assistant streams (src/gateway/openresponses-http.ts). The data is on the wire. We're choosing not to forward it.
It is opt-in and additive. New field gateway.http.endpoints.responses.exposeBuiltInToolCalls, default false. Existing callers see byte-identical responses. No PI runtime changes, no plugin SDK changes, no client-tool path changes, no new authentication surface, no new event channel.
It does not regress the client-tool delegate path. Caller-provided client tools still route through the existing pendingToolCalls → incomplete response path. The audit stream filters them out by name so the same tool never appears as both an audit item and a delegate item in one response.

Why opt-in (and why a boolean, not an enum)

Implementation

PR #75075 (glow1128:feat/expose-builtin-tool-calls).

src/config/types.gateway.ts + zod-schema.ts + schema.base.generated.ts: adds exposeBuiltInToolCalls?: boolean to GatewayHttpResponsesConfig.
src/gateway/openresponses-http.ts: adds a single pure helper tryCaptureBuiltInToolCall(evt, runId, clientToolNames) that returns a capture record or null. The streaming path folds it into the existing unsubscribe = onAgentEvent(...) listener (one subscription per request, not two). The non-streaming path adds a small inline subscription that accumulates into a local array. Output order is identical between paths: assistant message at index 0, audit items at 1+ in arrival order.
3 e2e tests in src/gateway/openresponses-http.test.ts covering default-off, JSON output ordering and args, and SSE output_item.added/done for each capture.
Docs (docs/gateway/openresponses-http-api.md) and CHANGELOG.md updated.

Final diff: ~290 lines, single concept, default-off.

Out of scope (please file separately if needed)

Tool result capture (only phase: "start" is surfaced).
Streaming token-level deltas of tool args.
"Delegate mode" where built-ins become caller-executable.
Changing existing response shape when the flag is off.

Environment

OpenClaw 2026.4.x
Branch: feat/expose-builtin-tool-calls on glow1128/openclaw

extent analysis

TL;DR

Enable the exposeBuiltInToolCalls flag in the Gateway HTTP responses configuration to include built-in tool calls in the response output.

Guidance

Set gateway.http.endpoints.responses.exposeBuiltInToolCalls to true in the configuration to opt-in to the new behavior.
Verify that the response output now includes function_call items for built-in tool invocations, such as bash and read.
Test the new behavior using the provided e2e tests in src/gateway/openresponses-http.test.ts.
Note that this change is additive and does not regress the client-tool delegate path.

Example

{
  "id": "resp_…",
  "status": "completed",
  "output": [
    { "type": "message", "role": "assistant",
      "content": [{ "type": "output_text", "text": "Found 3 TODOs in src/." }] },
    { "type": "function_call", "id": "call_…", "call_id": "tc_1",
      "name": "bash", "arguments": "{\"command\":\"grep -rn TODO src/\"}",
      "status": "completed" },
    { "type": "function_call", "id": "call_…", "call_id": "tc_2",
      "name": "read", "arguments": "{\"path\":\"src/foo.ts\"}",
      "status": "completed" }
  ]
}

Notes

This change is specific to the OpenClaw 2026.4.x version and the feat/expose-builtin-tool-calls branch. The implementation is limited to surfacing phase: "start" events for built-in tool calls.

Recommendation

Apply the workaround by setting exposeBuiltInToolCalls to true, as this is the minimum change required to make the desired

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt template #agent execution #callback error #memory management

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Feature]: /v1/responses drops every built-in tool call from `output`; add opt-in flag to surface them [2 pull requests, 4 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #75075: feat(gateway): surface built-in tool calls as function_call output items on /v1/responses

Description (problem / solution / changelog)

Summary

Note on the parallel PR #75107

Why a refactor was in this PR before the polish commit

Implementation

Test plan

Out of scope (deliberate)

Changed files

PR #75107: feat(gateway/responses): opt-in flag to surface built-in tool calls in /v1/responses

Description (problem / solution / changelog)

Behavior

Implementation

Test plan

Out of scope (deliberate, keeps the surface minimal)

Changed files

Code Example

Three concrete things you cannot do today

Before / after

Why this should land (and isn't a layering violation)

Why opt-in (and why a boolean, not an enum)

Implementation

Out of scope (please file separately if needed)

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING