openclaw - ✅(Solved) Fix SDK follow-up: host-owned structured plugin inference beyond media-understanding [3 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80188Fetched 2026-05-11 03:17:51
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
2
Timeline (top)
commented ×3cross-referenced ×3

Error Message

  • controlled success and error envelopes

Fix Action

Fixed

PR fix notes

PR #79334: [plugin sdk] Add structured extraction media runtime

Description (problem / solution / changelog)

Why this matters

OpenClaw plugins increasingly need to turn unstructured user content into safe, typed data: receipts into expense records, screenshots into support evidence, invoices into accounting fields, customer messages into CRM notes, PDFs into knowledge-base snippets, and product photos into searchable inventory metadata.

Today each plugin has to choose between two bad options:

  • implement its own model/auth/runtime bridge, usually requiring another user-managed API key; or
  • add product-specific extraction routes to core, which does not scale as the plugin ecosystem grows.

This PR adds the missing middle layer: a generic structured extraction capability in the media-understanding SDK. Product plugins keep owning their routes, schemas, storage, and UX, while OpenClaw owns the provider/runtime boundary, auth source, safety posture, and typed SDK contract.

What plugin authors can build with this

Examples this unlocks without adding plugin-specific logic to OpenClaw core:

  • Support plugins: extract error messages, stack traces, product names, issue category, severity, and reproduction steps from screenshots.
  • Knowledge-base plugins: convert documents or screenshots into normalized metadata and searchable evidence records.
  • CRM/sales plugins: extract companies, people, dates, action items, sentiment, and deal updates from inbound media plus short text context.
  • Finance/admin plugins: extract vendor, total, currency, tax, due date, and line-item hints from receipts or invoices.
  • Inventory/media plugins: extract labels, visible text, tags, object categories, and image summaries from uploaded photos.
  • Migration/import plugins: map arbitrary image inputs into a plugin-owned JSON schema before writing to the plugin's own database.

The important part: the plugin defines the schema and decides what to do with the result. OpenClaw only provides the generic, bounded extraction lane.

New SDK shape

This PR adds:

  • optional provider method: MediaUnderstandingProvider.extractStructured(...)
  • runtime helper: api.runtime.mediaUnderstanding.extractStructuredWithModel(...)
  • typed inputs for images plus optional supplemental text context
  • optional schemaName, jsonSchema, jsonMode, and timeoutMs
  • controlled result metadata: raw text, parsed JSON when JSON mode is enabled, model/provider, and content type

Example plugin call:

const result = await api.runtime.mediaUnderstanding.extractStructuredWithModel({
  provider: "codex",
  model: "gpt-5.5",
  input: [
    {
      type: "image",
      buffer: receiptImageBuffer,
      fileName: "receipt.png",
      mime: "image/png",
    },
    { type: "text", text: "Prefer the printed total over handwritten notes." },
  ],
  instructions: "Extract vendor, total, and searchable tags.",
  schemaName: "receipt.evidence",
  jsonSchema: {
    type: "object",
    properties: {
      vendor: { type: "string" },
      total: { type: "number" },
      tags: { type: "array", items: { type: "string" } },
    },
    required: ["vendor", "total"],
  },
  cfg: api.config,
});

Runtime architecture

flowchart LR
  Plugin["Plugin route, skill, or importer"] --> Runtime["api.runtime.mediaUnderstanding.extractStructuredWithModel"]
  Runtime --> Provider["MediaUnderstandingProvider.extractStructured"]
  Provider --> HostRuntime["Provider-owned host runtime"]
  HostRuntime --> Result["JSON result or controlled error"]
  Result --> Plugin
  Plugin --> Storage["Plugin-owned storage, tools, or user workflow"]

For the bundled Codex provider, this uses the existing Codex app-server/OAuth path rather than requiring a user-supplied model API key.

flowchart LR
  Plugin["Any OpenClaw plugin"] --> SDK["Structured extraction SDK"]
  SDK --> CodexProvider["Codex media-understanding provider"]
  CodexProvider --> AppServer["Codex app-server / OAuth runtime"]
  AppServer --> BoundedTurn["Ephemeral no-tools turn"]
  BoundedTurn --> JSON["Parsed JSON or controlled error"]

Safety and boundaries

The Codex implementation keeps the same bounded posture as image understanding:

  • ephemeral thread
  • read-only sandbox
  • no dynamic tools
  • approval policy set to on-request, with approval requests denied by the provider handler
  • timeout enforcement
  • model modality validation before turn start
  • JSON parsing failure returned as a controlled error
  • text-only extraction rejected at the runtime seam, keeping this image-first instead of turning it into a generic text completion lane
  • no product-specific route names, storage models, or schemas in OpenClaw core

This is intentionally a platform seam, not a feature-specific integration.

What changed

  • Adds structured extraction request/result types to the media-understanding SDK.
  • Adds extractStructuredWithModel(...) to the plugin runtime media-understanding facade.
  • Implements extractStructured(...) in the bundled Codex provider.
  • Preserves explicit config-provider image descriptions by keeping describeImageFileWithModel(...) on the full media-understanding registry instead of narrowing it to manifest-only plugin providers.
  • Forwards structured extraction auth-profile selection through the runtime helper so provider-owned OAuth/app-server runtimes can honor plugin-selected credentials.
  • Narrows the new seam to image-first extraction with optional supplemental text context instead of overlapping general text-only completion surfaces.
  • Adds tests for bounded Codex structured extraction, invalid JSON/schema handling, runtime routing, auth-profile forwarding, image-required guardrails, direct image-model registry routing, provider lookup failure, and runtime API exposure.
  • Documents the new runtime helper and the plugin/core ownership boundary.
  • Adds the required changelog entry for the new plugin SDK/runtime capability.

Relationship to existing LLM surfaces

OpenClaw already has api.runtime.llm.complete for trusted plugin text completions, and llm-task for workflow/tool-level JSON tasks. This PR is narrower and lower-level: a provider SDK/runtime media-understanding seam for schema-shaped extraction over image inputs with optional text context. That keeps extraction provider-owned and plugin-consumable without turning it into a general-purpose arbitrary Codex call API.

Non-goals

  • This does not add a product-specific extraction route to OpenClaw core.
  • This does not choose any plugin's storage model or JSON schema.
  • This does not replace existing image/audio/video media-understanding helpers.
  • This does not require plugins to use Codex; other providers can implement the same optional method.
  • This does not expand into generic text-only extraction; callers that want arbitrary text completions should keep using the existing LLM surfaces.

Background

This closes openclaw/openclaw#79321.

The immediate downstream need came from a GBrain/OpenClaw integration, but the implementation here is deliberately generic. GBrain, support, CRM, finance, inventory, migration, and knowledge-base plugins can all consume the same SDK seam while keeping their own product-specific routes and schemas outside OpenClaw core.

Real behavior proof

Behavior or issue addressed: The rebased branch exposes a typed plugin-runtime structured extraction seam that dispatches through a registered media-understanding provider, preserves the bounded Codex worker defaults, forwards the selected auth profile into the provider-owned runtime, and rejects text-only calls before provider dispatch.

Real environment tested: Local macOS OpenClaw checkout at /Users/lume/openclaw-review-worktrees/pr-79334-rebase, rebased head 78cfe4a76161fc7d3029beb4edcf7120a94a4d8b, using a standalone node --import tsx proof command outside Vitest. The proof registers the real bundled Codex media-understanding provider in the active plugin runtime registry with a stubbed app-server client, then calls createPluginRuntime().mediaUnderstanding.extractStructuredWithModel(...) once with image-plus-text input and once with text-only input.

Exact steps or command run after this patch:

cd /Users/lume/openclaw-review-worktrees/pr-79334-rebase
node --import tsx <<'EOF'
import { buildCodexMediaUnderstandingProvider } from './extensions/codex/media-understanding-provider.ts';
import { createPluginRuntime } from './src/plugins/runtime/index.ts';
import { createEmptyPluginRegistry } from './src/plugins/registry-empty.ts';
import { resetPluginRuntimeStateForTest, setActivePluginRegistry } from './src/plugins/runtime.ts';

function codexModel(inputModalities = ['text', 'image']) {
  return {
    id: 'gpt-5.4',
    model: 'gpt-5.4',
    upgrade: null,
    upgradeInfo: null,
    availabilityNux: null,
    displayName: 'gpt-5.4',
    description: 'GPT-5.4',
    hidden: false,
    supportedReasoningEfforts: [{ reasoningEffort: 'low', description: 'fast' }],
    defaultReasoningEffort: 'low',
    inputModalities,
    supportsPersonality: false,
    additionalSpeedTiers: [],
    isDefault: true,
  };
}

function threadStartResult() {
  return {
    thread: {
      id: 'thread-1',
      sessionId: 'session-1',
      forkedFromId: null,
      preview: '',
      ephemeral: true,
      modelProvider: 'openai',
      createdAt: 1,
      updatedAt: 1,
      status: { type: 'idle' },
      path: null,
      cwd: process.cwd(),
      cliVersion: '0.125.0',
      source: 'unknown',
      agentNickname: null,
      agentRole: null,
      gitInfo: null,
      name: null,
      turns: [],
    },
    model: 'gpt-5.4',
    modelProvider: 'openai',
    serviceTier: null,
    cwd: process.cwd(),
    instructionSources: [],
    approvalPolicy: 'on-request',
    approvalsReviewer: 'user',
    sandbox: { type: 'dangerFullAccess' },
    permissionProfile: null,
    reasoningEffort: null,
  };
}

function turnStartResult(status = 'inProgress', items = []) {
  return {
    turn: {
      id: 'turn-1',
      status,
      items,
      error: null,
      startedAt: null,
      completedAt: null,
      durationMs: null,
    },
  };
}

function createFakeClient(responseText) {
  const notifications = new Set();
  const requestHandlers = new Set();
  const requests = [];
  const request = async (method, params) => {
    requests.push({ method, params });
    if (method === 'model/list') return { data: [codexModel()], nextCursor: null };
    if (method === 'thread/start') return threadStartResult();
    if (method === 'turn/start') {
      for (const notify of notifications) {
        notify({ method: 'item/agentMessage/delta', params: { threadId: 'thread-1', turnId: 'turn-1', itemId: 'msg-1', delta: responseText } });
        notify({ method: 'turn/completed', params: { threadId: 'thread-1', turnId: 'turn-1', turn: turnStartResult('completed').turn } });
      }
      for (const handler of requestHandlers) handler({ method: 'item/permissions/requestApproval' });
      return turnStartResult();
    }
    return {};
  };
  return {
    client: {
      request,
      addNotificationHandler(handler) { notifications.add(handler); return () => notifications.delete(handler); },
      addRequestHandler(handler) { requestHandlers.add(handler); return () => requestHandlers.delete(handler); },
      close() {},
    },
    requests,
  };
}

const authProfileIds = [];
const { client, requests } = createFakeClient('{"summary":"red square","tags":["shape"]}');
const provider = buildCodexMediaUnderstandingProvider({
  clientFactory: async (_startOptions, authProfileId) => {
    authProfileIds.push(authProfileId ?? null);
    return client;
  },
});

const registry = createEmptyPluginRegistry();
registry.mediaUnderstandingProviders.push({
  pluginId: 'codex',
  pluginName: 'Codex',
  source: 'proof-script',
  provider,
});
setActivePluginRegistry(registry, 'proof-script', 'default', process.cwd());

const runtime = createPluginRuntime();
const success = await runtime.mediaUnderstanding.extractStructuredWithModel({
  provider: 'codex',
  model: 'gpt-5.4',
  input: [
    {
      type: 'image',
      buffer: Buffer.from('iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+kX3sAAAAASUVORK5CYII=', 'base64'),
      fileName: 'red-square.png',
      mime: 'image/png',
    },
    { type: 'text', text: 'Return searchable evidence for the uploaded image.' },
  ],
  instructions: 'Return JSON with summary and tags.',
  schemaName: 'proof.red-square',
  jsonSchema: {
    type: 'object',
    properties: {
      summary: { type: 'string' },
      tags: { type: 'array', items: { type: 'string' } },
    },
    required: ['summary'],
  },
  profile: 'openai-codex:work',
  cfg: {},
  agentDir: process.cwd(),
});

let guardError = null;
try {
  await runtime.mediaUnderstanding.extractStructuredWithModel({
    provider: 'codex',
    model: 'gpt-5.4',
    input: [{ type: 'text', text: 'No image present.' }],
    instructions: 'Return JSON.',
    cfg: {},
    agentDir: process.cwd(),
  });
} catch (error) {
  guardError = error instanceof Error ? error.message : String(error);
}

console.log(JSON.stringify({
  success,
  authProfileIds,
  requestMethods: requests.map((entry) => entry.method),
  threadStart: requests.find((entry) => entry.method === 'thread/start')?.params,
  turnInput: requests.find((entry) => entry.method === 'turn/start')?.params?.input,
  guardError,
}, null, 2));

resetPluginRuntimeStateForTest();
EOF

Evidence after fix:

{
  "success": {
    "text": "{\"summary\":\"red square\",\"tags\":[\"shape\"]}",
    "model": "gpt-5.4",
    "provider": "codex",
    "contentType": "json",
    "parsed": {
      "summary": "red square",
      "tags": [
        "shape"
      ]
    }
  },
  "authProfileIds": [
    "openai-codex:work"
  ],
  "requestMethods": [
    "model/list",
    "thread/start",
    "turn/start"
  ],
  "threadStart": {
    "model": "gpt-5.4",
    "modelProvider": "openai",
    "cwd": "/Users/lume/openclaw-review-worktrees/pr-79334-rebase",
    "approvalPolicy": "on-request",
    "sandbox": "read-only",
    "serviceName": "OpenClaw",
    "developerInstructions": "You are OpenClaw's bounded structured-extraction worker. Return only the requested extraction. Do not call tools, edit files, ask follow-up questions, or include secrets.",
    "dynamicTools": [],
    "experimentalRawEvents": true,
    "persistExtendedHistory": false,
    "ephemeral": true
  },
  "turnInput": [
    {
      "type": "text",
      "text": "Return JSON with summary and tags.\n\nSchema name: proof.red-square\n\nJSON schema:\n{\"type\":\"object\",\"properties\":{\"summary\":{\"type\":\"string\"},\"tags\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}}},\"required\":[\"summary\"]}\n\nReturn valid JSON only. Do not wrap the JSON in Markdown fences.",
      "text_elements": []
    },
    {
      "type": "image",
      "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+kX3sAAAAASUVORK5CYII="
    },
    {
      "type": "text",
      "text": "Return searchable evidence for the uploaded image.",
      "text_elements": []
    }
  ],
  "guardError": "Structured extraction requires at least one image input."
}

Observed result after fix: The plugin runtime facade dispatched extractStructuredWithModel(...) through the registered Codex media-understanding provider, the provider returned parsed JSON on the bounded app-server path, the selected auth profile reached the provider-owned runtime, and the text-only call failed early with the intended image-required guard instead of widening this seam into general text extraction.

What was not tested: This proof intentionally uses a stubbed app-server client so it can exercise the real runtime/provider dispatch path deterministically in a local checkout without requiring a desktop-bound live OAuth session. The PR does not include a credentialed live Codex desktop turn artifact because that would require shipping private local auth/session material into public review evidence.

Validation

  • pnpm install --frozen-lockfile
  • pnpm plugin-sdk:api:gen
  • pnpm plugin-sdk:api:check
  • pnpm test src/media-understanding/runtime.test.ts src/media-understanding/provider-registry.test.ts extensions/codex/media-understanding-provider.test.ts src/plugins/runtime/index.test.ts
  • pnpm check:changed

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
  • docs/plugins/architecture-internals.md (modified, +28/-0)
  • docs/plugins/sdk-runtime.md (modified, +27/-0)
  • docs/plugins/sdk-subpaths.md (modified, +1/-1)
  • extensions/codex/media-understanding-provider.test.ts (modified, +160/-2)
  • extensions/codex/media-understanding-provider.ts (modified, +198/-10)
  • src/media-understanding/runtime-types.ts (modified, +25/-0)
  • src/media-understanding/runtime.test.ts (modified, +155/-0)
  • src/media-understanding/runtime.ts (modified, +40/-1)
  • src/media-understanding/types.ts (modified, +41/-0)
  • src/plugin-sdk/media-understanding-runtime.ts (modified, +2/-0)
  • src/plugin-sdk/media-understanding.ts (modified, +5/-0)
  • src/plugin-sdk/test-helpers/plugin-runtime-mock.ts (modified, +2/-0)
  • src/plugins/runtime/index.test.ts (modified, +1/-0)
  • src/plugins/runtime/index.ts (modified, +3/-0)
  • src/plugins/runtime/types-core.ts (modified, +1/-0)

PR #72: fix: make /plugins/gbrain/extract the only default OAuth route

Description (problem / solution / changelog)

Why

The live OpenClaw GBrain plugin already uses /plugins/gbrain/extract, but the repo-side Codex extraction client still silently defaulted generic gateway completion calls to /plugins/gbrain/complete.

That was a real drift between the product path we actually run and the fallback path the repo still implied. The important thing here is not removing flexibility; it is making the default honest.

With this change, the default story becomes:

  • GBrain uses /plugins/gbrain/extract for OAuth-backed host execution
  • the OpenClaw plugin owns queueing, limits, and normalization
  • GBRAIN_OPENCLAW_COMPLETION_COMMAND remains the text-only fallback for hosts that cannot accept file media
  • legacy /plugins/gbrain/complete survives only as an explicit opt-in override

Closes #71.

What changed

  • removed the silent default of /plugins/gbrain/complete from the gateway client
  • kept /plugins/gbrain/extract as the repo and live-install default route
  • made gateway completeText() / completeJson() fail clearly unless GBRAIN_OPENCLAW_COMPLETION_PATH is explicitly set for a legacy host
  • updated the media guide to document the new rule
  • added tests for:
    • refusing generic gateway completion on the default extract route
    • allowing the legacy completion bridge only when explicitly configured
    • preserving the real extraction route and text-only command fallback behavior

Why this is the right boundary

This keeps the fork aligned with the production OAuth path we actually want:

  • OpenClaw core/plugin owns runtime auth and model execution
  • the GBrain plugin owns /plugins/gbrain/extract and gbrain.media-extraction.v1
  • GBrain core owns importing normalized evidence into searchable pages/chunks/files

It also lines up with the broader OpenClaw follow-up we just filed upstream in openclaw/openclaw#80188: bounded host-owned plugin inference should live in OpenClaw, while product plugins keep their own routes and schemas.

Validation

bun run verify
bun test test/codex-extraction-client.test.ts test/media-ingest-openclaw.serial.test.ts test/media-ingest.serial.test.ts

Changed files

  • docs/guides/content-media.md (modified, +3/-1)
  • src/core/ai/codex-extraction-client.ts (modified, +18/-9)
  • test/codex-extraction-client.test.ts (modified, +40/-0)

PR #80203: Plugin SDK: add host-owned structured runtime LLM

Description (problem / solution / changelog)

Why this should exist

OpenClaw already has two useful but incomplete lanes for plugin-side model work:

  • api.runtime.llm.complete(...) for trusted host-owned text completion
  • api.runtime.mediaUnderstanding.extractStructuredWithModel(...) in #79334 for provider-owned media-first structured extraction

What is still missing is the general middle lane for plugins that need bounded host-owned structured inference without:

  • handling OAuth/API credentials directly,
  • inventing bespoke shell bridges, or
  • stretching media-understanding into every structured workload.

That gap shows up across a lot more than one plugin family:

  • knowledge-base plugins turning text or screenshots into normalized evidence
  • support plugins extracting issue summaries, repro steps, and error clues from screenshots plus notes
  • CRM plugins extracting companies, people, dates, action items, and sentiment from inbound content
  • finance/admin plugins extracting vendor, totals, tax, and due dates from receipts or invoices
  • migration/import plugins mapping arbitrary raw content into plugin-owned JSON before storage

The goal of this PR is to make that generic host-owned lane first-class.

What this adds

This PR adds:

  • api.runtime.llm.completeStructured(...)
  • typed structured input blocks for text and optional images
  • optional jsonMode, jsonSchema, schemaName, timeoutMs, and profile
  • parsed JSON results when JSON mode is requested
  • the same host-owned model/auth/runtime preparation path as api.runtime.llm.complete(...)
  • the same trust gating model for model and agent overrides, plus a new explicit gate for auth-profile overrides

Why this belongs under runtime.llm

This API is deliberately the generic agent-bound runtime lane.

It reuses the same host-owned completion path as complete(...), then adds the structured affordances plugin authors actually need:

  • prompt shaping
  • optional image inputs
  • JSON/schema validation
  • timeout control
  • controlled typed output

That makes it the right home for general structured plugin inference.

Boundary vs #79334

This PR is a sister, not a replacement.

  • runtime.llm.completeStructured(...) is the generic agent-bound structured inference lane
  • mediaUnderstanding.extractStructuredWithModel(...) in #79334 remains the narrower provider-owned media capability lane

That separation keeps both seams honest:

  • use runtime.llm.completeStructured(...) when the plugin wants host-owned structured inference against the active agent/runtime path
  • use mediaUnderstanding.extractStructuredWithModel(...) when the plugin wants explicit provider/model media-routing behavior
flowchart LR
  Plugin["Trusted plugin"] --> LLM["api.runtime.llm.completeStructured(...)\nagent-bound generic lane"]
  Plugin --> Media["api.runtime.mediaUnderstanding.extractStructuredWithModel(...)\nprovider-owned media lane"]
  LLM --> Host["host-owned simple completion runtime"]
  Media --> Provider["media-understanding provider registry"]

Safety and trust model

This stays host-owned end to end.

Plugins do not receive raw OAuth tokens, refresh tokens, or provider secrets.

The host still owns:

  • auth resolution
  • provider runtime preparation
  • model routing
  • timeout handling
  • agent binding
  • trust gating

This PR also closes a subtle trust gap by treating auth-profile selection as a real override. profile now requires explicit opt-in via:

  • plugins.entries.<id>.llm.allowProfileOverride: true

The full runtime trust picture is now:

  • model overrides: allowModelOverride + optional allowedModels
  • cross-agent calls: allowAgentIdOverride
  • auth-profile selection: allowProfileOverride

Embedded model-ref suffixes now flow through that same gate, and conflicting profile vs model@profile inputs fail closed.

Behavior details

A few implementation details here are deliberate and worth calling out:

  • JSON mode is explicit: parsed JSON is only returned when jsonMode: true or jsonSchema is provided
  • Image inputs respect host image fallback behavior: when the active model is text-only, structured image calls reuse the host image-model fallback path instead of failing early when a configured image model exists
  • Context-engine parity is preserved: session-bound context-engine runtime hooks now expose completeStructured(...) alongside complete(...)

Example

const result = await api.runtime.llm.completeStructured({
  instructions: "Extract vendor, total, and searchable tags.",
  input: [
    {
      type: "image",
      buffer: receiptBuffer,
      mimeType: "image/png",
      fileName: "receipt.png",
    },
    { type: "text", text: "Prefer the printed total over handwritten notes." },
  ],
  jsonSchema: {
    type: "object",
    properties: {
      vendor: { type: "string" },
      total: { type: "number" },
      tags: { type: "array", items: { type: "string" } },
    },
    required: ["vendor", "total"],
  },
  purpose: "receipts.extract",
});

What changed

  • add LlmCompleteStructured* runtime types
  • add runtime.llm.completeStructured(...) to the plugin runtime facade
  • add trust gating for auth-profile override selection
  • extend the plugin config schema/docs with plugins.entries.<id>.llm.allowProfileOverride
  • reuse host image-model fallback routing for structured image calls
  • expose completeStructured(...) to context-engine runtime hooks
  • add runtime tests for JSON/schema validation, image inputs, profile trust, image fallback behavior, timeout behavior, and context-engine binding
  • add regression tests that block model@profile bypasses and reject conflicting explicit vs embedded auth-profile selection
  • regenerate the plugin SDK API baseline

Validation

  • pnpm test -- src/plugins/runtime/runtime-llm.runtime.test.ts src/plugins/runtime/index.test.ts src/plugins/config-state.test.ts src/config/schema.help.quality.test.ts
  • pnpm plugin-sdk:api:gen
  • pnpm plugin-sdk:api:check
  • pnpm check:changed

Real behavior proof

  • Behavior or issue addressed:

    • Plugins could express auth-profile selection either as profile: "..." or as a trailing model-ref suffix like openai/gpt-5.5@work.
    • Before this fix, the new trust gate only checked the explicit profile field, so embedded model@profile requests could steer credential selection without allowProfileOverride.
  • Real environment tested:

    • Local OpenClaw checkout on codex/plugin-inference-followup using the real runtime implementation through node --import tsx on macOS, with no test mocks in the proof command.
  • Exact steps or command run after this patch:

    node --import tsx -e 'import { createRuntimeLlm } from "./src/plugins/runtime/runtime-llm.runtime.ts"; const cfg = { agents: { defaults: { model: "openai/gpt-5.5" } } }; const llm = createRuntimeLlm({ getConfig: () => cfg, authority: { caller: { kind: "host", id: "proof" }, allowComplete: true, allowModelOverride: true, agentId: "ada" } }); for (const [label, params] of [["embedded-profile-blocked", { model: "openai/gpt-5.5@openai-codex:work", instructions: "Extract summary.", input: [{ type: "text", text: "Hello" }] }],["conflicting-profile-rejected", { model: "openai/gpt-5.5@openai-codex:work", profile: "openai-codex:other", instructions: "Extract summary.", input: [{ type: "text", text: "Hello" }] }]]) { try { await llm.completeStructured(params); console.log(label + ": UNEXPECTED_SUCCESS"); } catch (error) { console.log(label + ": " + (error instanceof Error ? error.message : String(error))); } }'
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): Terminal capture:

    embedded-profile-blocked: Plugin LLM completion cannot override the auth profile.
    conflicting-profile-rejected: Plugin LLM completion received conflicting auth profiles in model and profile fields.

    Supplemental focused regression coverage:

    ✓ rejects structured auth-profile overrides without explicit trust
    ✓ rejects auth-profile suffixes in structured model refs without explicit trust
    ✓ treats auth-profile suffixes in structured model refs as profile overrides when trusted
    ✓ rejects conflicting explicit and embedded structured auth-profile overrides
  • Observed result after fix:

    • The real runtime now fails closed on both bypass shapes before host auth preparation.
    • Structured callers can no longer pick a credential profile through model@profile unless allowProfileOverride is explicitly trusted.
    • Conflicting explicit and embedded profile selections are rejected instead of being resolved implicitly.
  • What was not tested:

    • A live provider-backed completion after an allowed profile override was not exercised in this proof step.
    • The trusted-path wiring is covered by the focused runtime tests above.
  • Before evidence (optional but encouraged):

    • Before this patch, profile was gated but the embedded model@profile form was not normalized before auth-profile policy evaluation.

Non-goals

  • no raw OAuth/token exposure to plugins
  • no plugin-specific routes or schemas in core
  • no replacement of #79334
  • no tool-using long-running agent workflow lane

Closes #80188.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
  • docs/gateway/configuration-reference.md (modified, +4/-3)
  • docs/plugins/sdk-runtime.md (modified, +42/-1)
  • src/agents/pi-embedded-runner/context-engine-capabilities.ts (modified, +16/-0)
  • src/config/schema.help.quality.test.ts (modified, +1/-0)
  • src/config/schema.help.ts (modified, +5/-3)
  • src/config/schema.labels.ts (modified, +1/-0)
  • src/config/types.plugins.ts (modified, +4/-2)
  • src/config/zod-schema.ts (modified, +1/-0)
  • src/context-engine/types.ts (modified, +3/-0)
  • src/plugin-sdk/test-helpers/plugin-runtime-mock.ts (modified, +1/-0)
  • src/plugins/config-normalization-shared.ts (modified, +8/-1)
  • src/plugins/config-state.test.ts (modified, +2/-0)
  • src/plugins/registry.ts (modified, +2/-0)
  • src/plugins/runtime/index.test.ts (modified, +10/-0)
  • src/plugins/runtime/index.ts (modified, +4/-0)
  • src/plugins/runtime/runtime-llm.runtime.test.ts (modified, +530/-6)
  • src/plugins/runtime/runtime-llm.runtime.ts (modified, +388/-54)
  • src/plugins/runtime/types-core.ts (modified, +48/-0)
RAW_BUFFERClick to expand / collapse

Why this should exist

#79334 is the right narrow seam for image-first structured extraction, but the maintainer feedback there surfaced the broader platform need: many plugins need bounded host-owned inference, not raw OAuth credentials and not product-specific core routes.

Today plugin authors that need typed model output for knowledge-base enrichment, CRM extraction, finance ingestion, support triage, or import pipelines still face two awkward choices:

  • ship bespoke bridges that shell out into host runtime behavior
  • ask for product-specific SDK seams one plugin shape at a time

A generic host-owned inference surface would keep auth, provider routing, timeouts, and safety in OpenClaw while letting plugins request structured work without ever receiving tokens.

Proposal

Add a general runtime API under api.runtime.llm, something like:

  • api.runtime.llm.completeStructured(...)

That is a better long-term fit than broadening media-understanding further or exposing raw OAuth credentials to plugin code.

Desired behavior

  • text-only structured completion
  • image-plus-text structured completion
  • optional jsonMode and jsonSchema
  • host-controlled model and auth profile
  • bounded timeout
  • controlled success and error envelopes
  • provider and model metadata in the response

Existing building blocks

  • api.runtime.llm.complete(...) already exists for trusted plugin text completions
  • api.runtime.modelAuth.getRuntimeAuthForModel(...) already resolves runtime auth
  • #79334 proves a bounded provider-owned structured extraction lane for media inputs

This follow-up would generalize the pattern without requiring each plugin to invent its own bridge.

Example use cases

  • KB plugins: turn text or screenshots into normalized evidence records
  • CRM plugins: extract people, companies, dates, action items, and summaries from text plus images
  • Finance plugins: receipt and invoice field extraction
  • Support plugins: screenshot plus reproduction-note triage
  • Migration/import plugins: map raw content into plugin-owned JSON before storage

Boundary

OpenClaw core owns

  • auth resolution
  • provider runtime exchange
  • model/profile policy
  • bounded execution
  • typed generic inference API

Plugins own

  • routes
  • prompts
  • schemas
  • storage
  • follow-on behavior

Non-goals

  • no raw OAuth token or refresh token exposure to plugins
  • no product-specific routes or schemas in core
  • no tool-using long-running agent lane
  • no replacement of #79334; that PR should stay the media-understanding seam

Acceptance criteria

  • no user model API key required when the host runtime already has auth
  • text-only structured requests work
  • image-plus-text structured requests work when the chosen provider supports them
  • JSON/schema validation failures return controlled errors
  • host policy still gates model/profile/agent overrides
  • secrets are not returned or logged
  • existing api.runtime.llm.complete(...) behavior does not regress

Relationship to #79334

That PR should remain intentionally narrow: image-first structured extraction via media-understanding. This issue tracks the broader host-owned plugin inference seam so future plugins do not need to keep stretching media-understanding to cover every structured workload.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix SDK follow-up: host-owned structured plugin inference beyond media-understanding [3 pull requests, 3 comments, 2 participants]