openclaw - ✅(Solved) Fix SDK follow-up: host-owned structured plugin inference beyond media-understanding [3 pull requests, 3 comments, 2 participants]

openclaw2026-05-10 08:47:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80188•Fetched 2026-05-11 03:17:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

100yenadmin

Participants

100yenadmin

clawsweeper[bot]

Timeline (top)

commented ×3cross-referenced ×3

Error Message

controlled success and error envelopes

Fix Action

Fixed

Fixed by PR: [plugin sdk] Add structured extraction media runtime (https://github.com/openclaw/openclaw/pull/79334)
Fixed by PR: fix: make /plugins/gbrain/extract the only default OAuth route (https://github.com/electricsheephq/eva-brain/pull/72)
Fixed by PR: Plugin SDK: add host-owned structured runtime LLM (https://github.com/openclaw/openclaw/pull/80203)

PR fix notes

PR #79334: [plugin sdk] Add structured extraction media runtime

Repository: openclaw/openclaw
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/79334

Description (problem / solution / changelog)

Why this matters

OpenClaw plugins increasingly need to turn unstructured user content into safe, typed data: receipts into expense records, screenshots into support evidence, invoices into accounting fields, customer messages into CRM notes, PDFs into knowledge-base snippets, and product photos into searchable inventory metadata.

Today each plugin has to choose between two bad options:

implement its own model/auth/runtime bridge, usually requiring another user-managed API key; or
add product-specific extraction routes to core, which does not scale as the plugin ecosystem grows.

This PR adds the missing middle layer: a generic structured extraction capability in the media-understanding SDK. Product plugins keep owning their routes, schemas, storage, and UX, while OpenClaw owns the provider/runtime boundary, auth source, safety posture, and typed SDK contract.

What plugin authors can build with this

Examples this unlocks without adding plugin-specific logic to OpenClaw core:

Support plugins: extract error messages, stack traces, product names, issue category, severity, and reproduction steps from screenshots.
Knowledge-base plugins: convert documents or screenshots into normalized metadata and searchable evidence records.
CRM/sales plugins: extract companies, people, dates, action items, sentiment, and deal updates from inbound media plus short text context.
Finance/admin plugins: extract vendor, total, currency, tax, due date, and line-item hints from receipts or invoices.
Inventory/media plugins: extract labels, visible text, tags, object categories, and image summaries from uploaded photos.
Migration/import plugins: map arbitrary image inputs into a plugin-owned JSON schema before writing to the plugin's own database.

The important part: the plugin defines the schema and decides what to do with the result. OpenClaw only provides the generic, bounded extraction lane.

New SDK shape

This PR adds:

optional provider method: MediaUnderstandingProvider.extractStructured(...)
runtime helper: api.runtime.mediaUnderstanding.extractStructuredWithModel(...)
typed inputs for images plus optional supplemental text context
optional schemaName, jsonSchema, jsonMode, and timeoutMs
controlled result metadata: raw text, parsed JSON when JSON mode is enabled, model/provider, and content type

Example plugin call:

const result = await api.runtime.mediaUnderstanding.extractStructuredWithModel({
  provider: "codex",
  model: "gpt-5.5",
  input: [
    {
      type: "image",
      buffer: receiptImageBuffer,
      fileName: "receipt.png",
      mime: "image/png",
    },
    { type: "text", text: "Prefer the printed total over handwritten notes." },
  ],
  instructions: "Extract vendor, total, and searchable tags.",
  schemaName: "receipt.evidence",
  jsonSchema: {
    type: "object",
    properties: {
      vendor: { type: "string" },
      total: { type: "number" },
      tags: { type: "array", items: { type: "string" } },
    },
    required: ["vendor", "total"],
  },
  cfg: api.config,
});

Runtime architecture

flowchart LR
  Plugin["Plugin route, skill, or importer"] --> Runtime["api.runtime.mediaUnderstanding.extractStructuredWithModel"]
  Runtime --> Provider["MediaUnderstandingProvider.extractStructured"]
  Provider --> HostRuntime["Provider-owned host runtime"]
  HostRuntime --> Result["JSON result or controlled error"]
  Result --> Plugin
  Plugin --> Storage["Plugin-owned storage, tools, or user workflow"]

For the bundled Codex provider, this uses the existing Codex app-server/OAuth path rather than requiring a user-supplied model API key.

flowchart LR
  Plugin["Any OpenClaw plugin"] --> SDK["Structured extraction SDK"]
  SDK --> CodexProvider["Codex media-understanding provider"]
  CodexProvider --> AppServer["Codex app-server / OAuth runtime"]
  AppServer --> BoundedTurn["Ephemeral no-tools turn"]
  BoundedTurn --> JSON["Parsed JSON or controlled error"]

Safety and boundaries

The Codex implementation keeps the same bounded posture as image understanding:

ephemeral thread
read-only sandbox
no dynamic tools
approval policy set to on-request, with approval requests denied by the provider handler
timeout enforcement
model modality validation before turn start
JSON parsing failure returned as a controlled error
text-only extraction rejected at the runtime seam, keeping this image-first instead of turning it into a generic text completion lane
no product-specific route names, storage models, or schemas in OpenClaw core

This is intentionally a platform seam, not a feature-specific integration.

What changed

Adds structured extraction request/result types to the media-understanding SDK.
Adds extractStructuredWithModel(...) to the plugin runtime media-understanding facade.
Implements extractStructured(...) in the bundled Codex provider.
Preserves explicit config-provider image descriptions by keeping describeImageFileWithModel(...) on the full media-understanding registry instead of narrowing it to manifest-only plugin providers.
Forwards structured extraction auth-profile selection through the runtime helper so provider-owned OAuth/app-server runtimes can honor plugin-selected credentials.
Narrows the new seam to image-first extraction with optional supplemental text context instead of overlapping general text-only completion surfaces.
Adds tests for bounded Codex structured extraction, invalid JSON/schema handling, runtime routing, auth-profile forwarding, image-required guardrails, direct image-model registry routing, provider lookup failure, and runtime API exposure.
Documents the new runtime helper and the plugin/core ownership boundary.
Adds the required changelog entry for the new plugin SDK/runtime capability.

Relationship to existing LLM surfaces

OpenClaw already has api.runtime.llm.complete for trusted plugin text completions, and llm-task for workflow/tool-level JSON tasks. This PR is narrower and lower-level: a provider SDK/runtime media-understanding seam for schema-shaped extraction over image inputs with optional text context. That keeps extraction provider-owned and plugin-consumable without turning it into a general-purpose arbitrary Codex call API.

Non-goals

This does not add a product-specific extraction route to OpenClaw core.
This does not choose any plugin's storage model or JSON schema.
This does not replace existing image/audio/video media-understanding helpers.
This does not require plugins to use Codex; other providers can implement the same optional method.
This does not expand into generic text-only extraction; callers that want arbitrary text completions should keep using the existing LLM surfaces.

Background

This closes openclaw/openclaw#79321.

The immediate downstream need came from a GBrain/OpenClaw integration, but the implementation here is deliberately generic. GBrain, support, CRM, finance, inventory, migration, and knowledge-base plugins can all consume the same SDK seam while keeping their own product-specific routes and schemas outside OpenClaw core.

Real behavior proof

Behavior or issue addressed: The rebased branch exposes a typed plugin-runtime structured extraction seam that dispatches through a registered media-understanding provider, preserves the bounded Codex worker defaults, forwards the selected auth profile into the provider-owned runtime, and rejects text-only calls before provider dispatch.

Real environment tested: Local macOS OpenClaw checkout at /Users/lume/openclaw-review-worktrees/pr-79334-rebase, rebased head 78cfe4a76161fc7d3029beb4edcf7120a94a4d8b, using a standalone node --import tsx proof command outside Vitest. The proof registers the real bundled Codex media-understanding provider in the active plugin runtime registry with a stubbed app-server client, then calls createPluginRuntime().mediaUnderstanding.extractStructuredWithModel(...) once with image-plus-text input and once with text-only input.

Exact steps or command run after this patch:

cd /Users/lume/openclaw-review-worktrees/pr-79334-rebase
node --import tsx <<'EOF'
import { buildCodexMediaUnderstandingProvider } from './extensions/codex/media-understanding-provider.ts';
import { createPluginRuntime } from './src/plugins/runtime/index.ts';
import { createEmptyPluginRegistry } from './src/plugins/registry-empty.ts';
import { resetPluginRuntimeStateForTest, setActivePluginRegistry } from './src/plugins/runtime.ts';

function codexModel(inputModalities = ['text', 'image']) {
  return {
    id: 'gpt-5.4',
    model: 'gpt-5.4',
    upgrade: null,
    upgradeInfo: null,
    availabilityNux: null,
    displayName: 'gpt-5.4',
    description: 'GPT-5.4',
    hidden: false,
    supportedReasoningEfforts: [{ reasoningEffort: 'low', description: 'fast' }],
    defaultReasoningEffort: 'low',
    inputModalities,
    supportsPersonality: false,
    additionalSpeedTiers: [],
    isDefault: true,
  };
}

function threadStartResult() {
  return {
    thread: {
      id: 'thread-1',
      sessionId: 'session-1',
      forkedFromId: null,
      preview: '',
      ephemeral: true,
      modelProvider: 'openai',
      createdAt: 1,
      updatedAt: 1,
      status: { type: 'idle' },
      path: null,
      cwd: process.cwd(),
      cliVersion: '0.125.0',
      source: 'unknown',
      agentNickname: null,
      agentRole: null,
      gitInfo: null,
      name: null,
      turns: [],
    },
    model: 'gpt-5.4',
    modelProvider: 'openai',
    serviceTier: null,
    cwd: process.cwd(),
    instructionSources: [],
    approvalPolicy: 'on-request',
    approvalsReviewer: 'user',
    sandbox: { type: 'dangerFullAccess' },
    permissionProfile: null,
    reasoningEffort: null,
  };
}

function turnStartResult(status = 'inProgress', items = []) {
  return {
    turn: {
      id: 'turn-1',
      status,
      items,
      error: null,
      startedAt: null,
      completedAt: null,
      durationMs: null,
    },
  };
}

function createFakeClient(responseText) {
  const notifications = new Set();
  const requestHandlers = new Set();
  const requests = [];
  const request = async (method, params) => {
    requests.push({ method, params });
    if (method === 'model/list') return { data: [codexModel()], nextCursor: null };
    if (method === 'thread/start') return threadStartResult();
    if (method === 'turn/start') {
      for (const notify of notifications) {
        notify({ method: 'item/agentMessage/delta', params: { threadId: 'thread-1', turnId: 'turn-1', itemId: 'msg-1', delta: responseText } });
        notify({ method: 'turn/completed', params: { threadId: 'thread-1', turnId: 'turn-1', turn: turnStartResult('completed').turn } });
      }
      for (const handler of requestHandlers) handler({ method: 'item/permissions/requestApproval' });
      return turnStartResult();
    }
    return {};
  };
  return {
    client: {
      request,
      addNotificationHandler(handler) { notifications.add(handler); return () => notifications.delete(handler); },
      addRequestHandler(handler) { requestHandlers.add(handler); return () => requestHandlers.delete(handler); },
      close() {},
    },
    requests,
  };
}

const authProfileIds = [];
const { client, requests } = createFakeClient('{"summary":"red square","tags":["shape"]}');
const provider = buildCodexMediaUnderstandingProvider({
  clientFactory: async (_startOptions, authProfileId) => {
    authProfileIds.push(authProfileId ?? null);
    return client;
  },
});

const registry = createEmptyPluginRegistry();
registry.mediaUnderstandingProviders.push({
  pluginId: 'codex',
  pluginName: 'Codex',
  source: 'proof-script',
  provider,
});
setActivePluginRegistry(registry, 'proof-script', 'default', process.cwd());

const runtime = createPluginRuntime();
const success = await runtime.mediaUnderstanding.extractStructuredWithModel({
  provider: 'codex',
  model: 'gpt-5.4',
  input: [
    {
      type: 'image',
      buffer: Buffer.from('iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+kX3sAAAAASUVORK5CYII=', 'base64'),
      fileName: 'red-square.png',
      mime: 'image/png',
    },
    { type: 'text', text: 'Return searchable evidence for the uploaded image.' },
  ],
  instructions: 'Return JSON with summary and tags.',
  schemaName: 'proof.red-square',
  jsonSchema: {
    type: 'object',
    properties: {
      summary: { type: 'string' },
      tags: { type: 'array', items: { type: 'string' } },
    },
    required: ['summary'],
  },
  profile: 'openai-codex:work',
  cfg: {},
  agentDir: process.cwd(),
});

let guardError = null;
try {
  await runtime.mediaUnderstanding.extractStructuredWithModel({
    provider: 'codex',
    model: 'gpt-5.4',
    input: [{ type: 'text', text: 'No image present.' }],
    instructions: 'Return JSON.',
    cfg: {},
    agentDir: process.cwd(),
  });
} catch (error) {
  guardError = error instanceof Error ? error.message : String(error);
}

console.log(JSON.stringify({
  success,
  authProfileIds,
  requestMethods: requests.map((entry) => entry.method),
  threadStart: requests.find((entry) => entry.method === 'thread/start')?.params,
  turnInput: requests.find((entry) => entry.method === 'turn/start')?.params?.input,
  guardError,
}, null, 2));

resetPluginRuntimeStateForTest();
EOF

Evidence after fix:

{
  "success": {
    "text": "{\"summary\":\"red square\",\"tags\":[\"shape\"]}",
    "model": "gpt-5.4",
    "provider": "codex",
    "contentType": "json",
    "parsed": {
      "summary": "red square",
      "tags": [
        "shape"
      ]
    }
  },
  "authProfileIds": [
    "openai-codex:work"
  ],
  "requestMethods": [
    "model/list",
    "thread/start",
    "turn/start"
  ],
  "threadStart": {
    "model": "gpt-5.4",
    "modelProvider": "openai",
    "cwd": "/Users/lume/openclaw-review-worktrees/pr-79334-rebase",
    "approvalPolicy": "on-request",
    "sandbox": "read-only",
    "serviceName": "OpenClaw",
    "developerInstructions": "You are OpenClaw's bounded structured-extraction worker. Return only the requested extraction. Do not call tools, edit files, ask follow-up questions, or include secrets.",
    "dynamicTools": [],
    "experimentalRawEvents": true,
    "persistExtendedHistory": false,
    "ephemeral": true
  },
  "turnInput": [
    {
      "type": "text",
      "text": "Return JSON with summary and tags.\n\nSchema name: proof.red-square\n\nJSON schema:\n{\"type\":\"object\",\"properties\":{\"summary\":{\"type\":\"string\"},\"tags\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}}},\"required\":[\"summary\"]}\n\nReturn valid JSON only. Do not wrap the JSON in Markdown fences.",
      "text_elements": []
    },
    {
      "type": "image",
      "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+kX3sAAAAASUVORK5CYII="
    },
    {
      "type": "text",
      "text": "Return searchable evidence for the uploaded image.",
      "text_elements": []
    }
  ],
  "guardError": "Structured extraction requires at least one image input."
}

Observed result after fix: The plugin runtime facade dispatched extractStructuredWithModel(...) through the registered Codex media-understanding provider, the provider returned parsed JSON on the bounded app-server path, the selected auth profile reached the provider-owned runtime, and the text-only call failed early with the intended image-required guard instead of widening this seam into general text extraction.

What was not tested: This proof intentionally uses a stubbed app-server client so it can exercise the real runtime/provider dispatch path deterministically in a local checkout without requiring a desktop-bound live OAuth session. The PR does not include a credentialed live Codex desktop turn artifact because that would require shipping private local auth/session material into public review evidence.

Validation

pnpm install --frozen-lockfile
pnpm plugin-sdk:api:gen
pnpm plugin-sdk:api:check
pnpm test src/media-understanding/runtime.test.ts src/media-understanding/provider-registry.test.ts extensions/codex/media-understanding-provider.test.ts src/plugins/runtime/index.test.ts
pnpm check:changed

Changed files

CHANGELOG.md (modified, +1/-0)
docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
docs/plugins/architecture-internals.md (modified, +28/-0)
docs/plugins/sdk-runtime.md (modified, +27/-0)
docs/plugins/sdk-subpaths.md (modified, +1/-1)
extensions/codex/media-understanding-provider.test.ts (modified, +160/-2)
extensions/codex/media-understanding-provider.ts (modified, +198/-10)
src/media-understanding/runtime-types.ts (modified, +25/-0)
src/media-understanding/runtime.test.ts (modified, +155/-0)
src/media-understanding/runtime.ts (modified, +40/-1)
src/media-understanding/types.ts (modified, +41/-0)
src/plugin-sdk/media-understanding-runtime.ts (modified, +2/-0)
src/plugin-sdk/media-understanding.ts (modified, +5/-0)
src/plugin-sdk/test-helpers/plugin-runtime-mock.ts (modified, +2/-0)
src/plugins/runtime/index.test.ts (modified, +1/-0)
src/plugins/runtime/index.ts (modified, +3/-0)
src/plugins/runtime/types-core.ts (modified, +1/-0)

PR #72: fix: make /plugins/gbrain/extract the only default OAuth route

Repository: electricsheephq/eva-brain
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/electricsheephq/eva-brain/pull/72

Description (problem / solution / changelog)

Why

The live OpenClaw GBrain plugin already uses /plugins/gbrain/extract, but the repo-side Codex extraction client still silently defaulted generic gateway completion calls to /plugins/gbrain/complete.

That was a real drift between the product path we actually run and the fallback path the repo still implied. The important thing here is not removing flexibility; it is making the default honest.

With this change, the default story becomes:

GBrain uses /plugins/gbrain/extract for OAuth-backed host execution
the OpenClaw plugin owns queueing, limits, and normalization
GBRAIN_OPENCLAW_COMPLETION_COMMAND remains the text-only fallback for hosts that cannot accept file media
legacy /plugins/gbrain/complete survives only as an explicit opt-in override

Closes #71.

What changed

removed the silent default of /plugins/gbrain/complete from the gateway client
kept /plugins/gbrain/extract as the repo and live-install default route
made gateway completeText() / completeJson() fail clearly unless GBRAIN_OPENCLAW_COMPLETION_PATH is explicitly set for a legacy host
updated the media guide to document the new rule
added tests for:
- refusing generic gateway completion on the default extract route
- allowing the legacy completion bridge only when explicitly configured
- preserving the real extraction route and text-only command fallback behavior

Why this is the right boundary

This keeps the fork aligned with the production OAuth path we actually want:

OpenClaw core/plugin owns runtime auth and model execution
the GBrain plugin owns /plugins/gbrain/extract and gbrain.media-extraction.v1
GBrain core owns importing normalized evidence into searchable pages/chunks/files

It also lines up with the broader OpenClaw follow-up we just filed upstream in openclaw/openclaw#80188: bounded host-owned plugin inference should live in OpenClaw, while product plugins keep their own routes and schemas.

Validation

bun run verify
bun test test/codex-extraction-client.test.ts test/media-ingest-openclaw.serial.test.ts test/media-ingest.serial.test.ts

Changed files

docs/guides/content-media.md (modified, +3/-1)
src/core/ai/codex-extraction-client.ts (modified, +18/-9)
test/codex-extraction-client.test.ts (modified, +40/-0)

PR #80203: Plugin SDK: add host-owned structured runtime LLM

Repository: openclaw/openclaw
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/80203

Description (problem / solution / changelog)

Why this should exist

OpenClaw already has two useful but incomplete lanes for plugin-side model work:

api.runtime.llm.complete(...) for trusted host-owned text completion
api.runtime.mediaUnderstanding.extractStructuredWithModel(...) in #79334 for provider-owned media-first structured extraction

What is still missing is the general middle lane for plugins that need bounded host-owned structured inference without:

handling OAuth/API credentials directly,
inventing bespoke shell bridges, or
stretching media-understanding into every structured workload.

That gap shows up across a lot more than one plugin family:

knowledge-base plugins turning text or screenshots into normalized evidence
support plugins extracting issue summaries, repro steps, and error clues from screenshots plus notes
CRM plugins extracting companies, people, dates, action items, and sentiment from inbound content
finance/admin plugins extracting vendor, totals, tax, and due dates from receipts or invoices
migration/import plugins mapping arbitrary raw content into plugin-owned JSON before storage

The goal of this PR is to make that generic host-owned lane first-class.

What this adds

This PR adds:

api.runtime.llm.completeStructured(...)
typed structured input blocks for text and optional images
optional jsonMode, jsonSchema, schemaName, timeoutMs, and profile
parsed JSON results when JSON mode is requested
the same host-owned model/auth/runtime preparation path as api.runtime.llm.complete(...)
the same trust gating model for model and agent overrides, plus a new explicit gate for auth-profile overrides

Why this belongs under `runtime.llm`

This API is deliberately the generic agent-bound runtime lane.

It reuses the same host-owned completion path as complete(...), then adds the structured affordances plugin authors actually need:

prompt shaping
optional image inputs
JSON/schema validation
timeout control
controlled typed output

That makes it the right home for general structured plugin inference.

Boundary vs `#79334`

This PR is a sister, not a replacement.

runtime.llm.completeStructured(...) is the generic agent-bound structured inference lane
mediaUnderstanding.extractStructuredWithModel(...) in #79334 remains the narrower provider-owned media capability lane

That separation keeps both seams honest:

use runtime.llm.completeStructured(...) when the plugin wants host-owned structured inference against the active agent/runtime path
use mediaUnderstanding.extractStructuredWithModel(...) when the plugin wants explicit provider/model media-routing behavior

flowchart LR
  Plugin["Trusted plugin"] --> LLM["api.runtime.llm.completeStructured(...)\nagent-bound generic lane"]
  Plugin --> Media["api.runtime.mediaUnderstanding.extractStructuredWithModel(...)\nprovider-owned media lane"]
  LLM --> Host["host-owned simple completion runtime"]
  Media --> Provider["media-understanding provider registry"]

Safety and trust model

This stays host-owned end to end.

Plugins do not receive raw OAuth tokens, refresh tokens, or provider secrets.

The host still owns:

auth resolution
provider runtime preparation
model routing
timeout handling
agent binding
trust gating

This PR also closes a subtle trust gap by treating auth-profile selection as a real override. profile now requires explicit opt-in via:

plugins.entries.<id>.llm.allowProfileOverride: true

The full runtime trust picture is now:

model overrides: allowModelOverride + optional allowedModels
cross-agent calls: allowAgentIdOverride
auth-profile selection: allowProfileOverride

Embedded model-ref suffixes now flow through that same gate, and conflicting profile vs model@profile inputs fail closed.

Behavior details

A few implementation details here are deliberate and worth calling out:

JSON mode is explicit: parsed JSON is only returned when jsonMode: true or jsonSchema is provided
Image inputs respect host image fallback behavior: when the active model is text-only, structured image calls reuse the host image-model fallback path instead of failing early when a configured image model exists
Context-engine parity is preserved: session-bound context-engine runtime hooks now expose completeStructured(...) alongside complete(...)

Example

const result = await api.runtime.llm.completeStructured({
  instructions: "Extract vendor, total, and searchable tags.",
  input: [
    {
      type: "image",
      buffer: receiptBuffer,
      mimeType: "image/png",
      fileName: "receipt.png",
    },
    { type: "text", text: "Prefer the printed total over handwritten notes." },
  ],
  jsonSchema: {
    type: "object",
    properties: {
      vendor: { type: "string" },
      total: { type: "number" },
      tags: { type: "array", items: { type: "string" } },
    },
    required: ["vendor", "total"],
  },
  purpose: "receipts.extract",
});

What changed

add LlmCompleteStructured* runtime types
add runtime.llm.completeStructured(...) to the plugin runtime facade
add trust gating for auth-profile override selection
extend the plugin config schema/docs with plugins.entries.<id>.llm.allowProfileOverride
reuse host image-model fallback routing for structured image calls
expose completeStructured(...) to context-engine runtime hooks
add runtime tests for JSON/schema validation, image inputs, profile trust, image fallback behavior, timeout behavior, and context-engine binding
add regression tests that block model@profile bypasses and reject conflicting explicit vs embedded auth-profile selection
regenerate the plugin SDK API baseline

Validation

pnpm test -- src/plugins/runtime/runtime-llm.runtime.test.ts src/plugins/runtime/index.test.ts src/plugins/config-state.test.ts src/config/schema.help.quality.test.ts
pnpm plugin-sdk:api:gen
pnpm plugin-sdk:api:check
pnpm check:changed

Real behavior proof

Behavior or issue addressed:
- Plugins could express auth-profile selection either as profile: "..." or as a trailing model-ref suffix like openai/gpt-5.5@work.
- Before this fix, the new trust gate only checked the explicit profile field, so embedded model@profile requests could steer credential selection without allowProfileOverride.
Real environment tested:
- Local OpenClaw checkout on codex/plugin-inference-followup using the real runtime implementation through node --import tsx on macOS, with no test mocks in the proof command.

Exact steps or command run after this patch:

node --import tsx -e 'import { createRuntimeLlm } from "./src/plugins/runtime/runtime-llm.runtime.ts"; const cfg = { agents: { defaults: { model: "openai/gpt-5.5" } } }; const llm = createRuntimeLlm({ getConfig: () => cfg, authority: { caller: { kind: "host", id: "proof" }, allowComplete: true, allowModelOverride: true, agentId: "ada" } }); for (const [label, params] of [["embedded-profile-blocked", { model: "openai/gpt-5.5@openai-codex:work", instructions: "Extract summary.", input: [{ type: "text", text: "Hello" }] }],["conflicting-profile-rejected", { model: "openai/gpt-5.5@openai-codex:work", profile: "openai-codex:other", instructions: "Extract summary.", input: [{ type: "text", text: "Hello" }] }]]) { try { await llm.completeStructured(params); console.log(label + ": UNEXPECTED_SUCCESS"); } catch (error) { console.log(label + ": " + (error instanceof Error ? error.message : String(error))); } }'

Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): Terminal capture:

embedded-profile-blocked: Plugin LLM completion cannot override the auth profile.
conflicting-profile-rejected: Plugin LLM completion received conflicting auth profiles in model and profile fields.

Supplemental focused regression coverage:

✓ rejects structured auth-profile overrides without explicit trust
✓ rejects auth-profile suffixes in structured model refs without explicit trust
✓ treats auth-profile suffixes in structured model refs as profile overrides when trusted
✓ rejects conflicting explicit and embedded structured auth-profile overrides

Observed result after fix:
- The real runtime now fails closed on both bypass shapes before host auth preparation.
- Structured callers can no longer pick a credential profile through model@profile unless allowProfileOverride is explicitly trusted.
- Conflicting explicit and embedded profile selections are rejected instead of being resolved implicitly.
What was not tested:
- A live provider-backed completion after an allowed profile override was not exercised in this proof step.
- The trusted-path wiring is covered by the focused runtime tests above.
Before evidence (optional but encouraged):
- Before this patch, profile was gated but the embedded model@profile form was not normalized before auth-profile policy evaluation.

Non-goals

no raw OAuth/token exposure to plugins
no plugin-specific routes or schemas in core
no replacement of #79334
no tool-using long-running agent workflow lane

Closes #80188.

Changed files

CHANGELOG.md (modified, +1/-0)
docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
docs/gateway/configuration-reference.md (modified, +4/-3)
docs/plugins/sdk-runtime.md (modified, +42/-1)
src/agents/pi-embedded-runner/context-engine-capabilities.ts (modified, +16/-0)
src/config/schema.help.quality.test.ts (modified, +1/-0)
src/config/schema.help.ts (modified, +5/-3)
src/config/schema.labels.ts (modified, +1/-0)
src/config/types.plugins.ts (modified, +4/-2)
src/config/zod-schema.ts (modified, +1/-0)
src/context-engine/types.ts (modified, +3/-0)
src/plugin-sdk/test-helpers/plugin-runtime-mock.ts (modified, +1/-0)
src/plugins/config-normalization-shared.ts (modified, +8/-1)
src/plugins/config-state.test.ts (modified, +2/-0)
src/plugins/registry.ts (modified, +2/-0)
src/plugins/runtime/index.test.ts (modified, +10/-0)
src/plugins/runtime/index.ts (modified, +4/-0)
src/plugins/runtime/runtime-llm.runtime.test.ts (modified, +530/-6)
src/plugins/runtime/runtime-llm.runtime.ts (modified, +388/-54)
src/plugins/runtime/types-core.ts (modified, +48/-0)

RAW_BUFFERClick to expand / collapse

Why this should exist

#79334 is the right narrow seam for image-first structured extraction, but the maintainer feedback there surfaced the broader platform need: many plugins need bounded host-owned inference, not raw OAuth credentials and not product-specific core routes.

Today plugin authors that need typed model output for knowledge-base enrichment, CRM extraction, finance ingestion, support triage, or import pipelines still face two awkward choices:

ship bespoke bridges that shell out into host runtime behavior
ask for product-specific SDK seams one plugin shape at a time

A generic host-owned inference surface would keep auth, provider routing, timeouts, and safety in OpenClaw while letting plugins request structured work without ever receiving tokens.

Proposal

Add a general runtime API under api.runtime.llm, something like:

api.runtime.llm.completeStructured(...)

That is a better long-term fit than broadening media-understanding further or exposing raw OAuth credentials to plugin code.

Desired behavior

text-only structured completion
image-plus-text structured completion
optional jsonMode and jsonSchema
host-controlled model and auth profile
bounded timeout
controlled success and error envelopes
provider and model metadata in the response

Existing building blocks

api.runtime.llm.complete(...) already exists for trusted plugin text completions
api.runtime.modelAuth.getRuntimeAuthForModel(...) already resolves runtime auth
#79334 proves a bounded provider-owned structured extraction lane for media inputs

This follow-up would generalize the pattern without requiring each plugin to invent its own bridge.

Example use cases

KB plugins: turn text or screenshots into normalized evidence records
CRM plugins: extract people, companies, dates, action items, and summaries from text plus images
Finance plugins: receipt and invoice field extraction
Support plugins: screenshot plus reproduction-note triage
Migration/import plugins: map raw content into plugin-owned JSON before storage

Boundary

OpenClaw core owns

auth resolution
provider runtime exchange
model/profile policy
bounded execution
typed generic inference API

Plugins own

routes
prompts
schemas
storage
follow-on behavior

Non-goals

no raw OAuth token or refresh token exposure to plugins
no product-specific routes or schemas in core
no tool-using long-running agent lane
no replacement of #79334; that PR should stay the media-understanding seam

Acceptance criteria

no user model API key required when the host runtime already has auth
text-only structured requests work
image-plus-text structured requests work when the chosen provider supports them
JSON/schema validation failures return controlled errors
host policy still gates model/profile/agent overrides
secrets are not returned or logged
existing api.runtime.llm.complete(...) behavior does not regress

Relationship to `#79334`

That PR should remain intentionally narrow: image-first structured extraction via media-understanding. This issue tracks the broader host-owned plugin inference seam so future plugins do not need to keep stretching media-understanding to cover every structured workload.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix SDK follow-up: host-owned structured plugin inference beyond media-understanding [3 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #79334: [plugin sdk] Add structured extraction media runtime

Description (problem / solution / changelog)

Why this matters

What plugin authors can build with this

New SDK shape

Runtime architecture

Safety and boundaries

What changed

Relationship to existing LLM surfaces

Non-goals

Background

Real behavior proof

Validation

Changed files

PR #72: fix: make /plugins/gbrain/extract the only default OAuth route

Description (problem / solution / changelog)

Why

What changed

Why this is the right boundary

Validation

Changed files

PR #80203: Plugin SDK: add host-owned structured runtime LLM

Description (problem / solution / changelog)

Why this should exist

What this adds

Why this belongs under runtime.llm

Boundary vs #79334

Safety and trust model

Behavior details

Example

What changed

Validation

Real behavior proof

Non-goals

Changed files

Why this should exist

Proposal

Desired behavior

Existing building blocks

Example use cases

Boundary

Non-goals

Acceptance criteria

Relationship to #79334

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Why this belongs under `runtime.llm`

Boundary vs `#79334`

Relationship to `#79334`