openclaw - 💡(How to fix) Fix perf: per-request auth (5.5s) and tool bundling (8.9s) dominate gateway TTFT [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80131Fetched 2026-05-11 03:18:25
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1

Profiling the embedded-run pipeline on a Mac mini M4 (24GB RAM, openclaw 2026.5.7, primary google/gemini-3.1-pro-preview via Vertex SA) shows that ~14 of every ~43 seconds of time-to-first-token is spent on work that doesn't change between requests:

StageAvgWhat it does
auth5.5sVertex SA token resolution per request
attempt-dispatch5.4sFirst HTTP roundtrip (no keep-alive pool)
core-plugin-tools3.9sPlugin tool module loading per request
bundle-tools5.0sTool definition serialization per request
system-prompt10.7sMemory search + assembly
stream-setup10.4sSSE stream open + tool-call hooks
Total TTFT42.9s(38.6s best, 62.9s worst)

Network RTT to all relevant endpoints is 9-32ms — none of the 42s comes from raw network.

This issue covers the two stages where the work is most clearly redundant: auth and bundle-tools/core-plugin-tools.

Error Message

if (!token) throw new Error('no token');

Root Cause

Profiling the embedded-run pipeline on a Mac mini M4 (24GB RAM, openclaw 2026.5.7, primary google/gemini-3.1-pro-preview via Vertex SA) shows that ~14 of every ~43 seconds of time-to-first-token is spent on work that doesn't change between requests:

StageAvgWhat it does
auth5.5sVertex SA token resolution per request
attempt-dispatch5.4sFirst HTTP roundtrip (no keep-alive pool)
core-plugin-tools3.9sPlugin tool module loading per request
bundle-tools5.0sTool definition serialization per request
system-prompt10.7sMemory search + assembly
stream-setup10.4sSSE stream open + tool-call hooks
Total TTFT42.9s(38.6s best, 62.9s worst)

Network RTT to all relevant endpoints is 9-32ms — none of the 42s comes from raw network.

This issue covers the two stages where the work is most clearly redundant: auth and bundle-tools/core-plugin-tools.

Fix Action

Fix / Workaround

StageAvgWhat it does
auth5.5sVertex SA token resolution per request
attempt-dispatch5.4sFirst HTTP roundtrip (no keep-alive pool)
core-plugin-tools3.9sPlugin tool module loading per request
bundle-tools5.0sTool definition serialization per request
system-prompt10.7sMemory search + assembly
stream-setup10.4sSSE stream open + tool-call hooks
Total TTFT42.9s(38.6s best, 62.9s worst)

If both proposals land, gateway TTFT drops from ~43s to ~28s on this workload — without functional changes. Plus a separate ~1s win from connection keep-alive (undici global dispatcher with allowH2) which I'm not raising here since it's a more invasive transport change.

Happy to test patches if anyone takes this on, or to open a PR myself if there's interest in the approach.

Code Example

// src/auth/google-singleton.ts
import { GoogleAuth, JWT } from 'google-auth-library';

const auth = new GoogleAuth({
  keyFilename: process.env.GOOGLE_APPLICATION_CREDENTIALS,
  scopes: ['https://www.googleapis.com/auth/cloud-platform'],
});

let clientPromise: Promise<JWT> | null = null;
export function getAuthClient(): Promise<JWT> {
  if (!clientPromise) {
    clientPromise = auth.getClient().then((c) => {
      (c as any).eagerRefreshThresholdMillis = 5 * 60 * 1000; // 5-min eager refresh
      return c as JWT;
    });
  }
  return clientPromise;
}

export async function getBearerToken(): Promise<string> {
  const client = await getAuthClient();
  const { token } = await client.getAccessToken();
  if (!token) throw new Error('no token');
  return token;
}

---

// src/tools/bundle.ts
import stableStringify from 'fast-json-stable-stringify';
import { allPlugins } from '../plugins';
import { createHash } from 'node:crypto';

let cache: { tools: any[]; json: string; hash: string } | null = null;

function build() {
  const tools = allPlugins.flatMap(p => p.tools.map(t => ({
    name: t.name,
    description: t.description,
    input_schema: t.jsonSchema,  // pre-converted from Zod once
  })));
  const json = stableStringify(tools);
  const hash = createHash('sha256').update(json).digest('hex');
  return { tools: Object.freeze(tools), json, hash };
}

export function getTools() {
  if (!cache) cache = build();
  return cache;
}

// SIGHUP-driven invalidation for plugin reload (off the request path)
export function invalidateTools() { cache = null; }
process.on('SIGHUP', invalidateTools);
RAW_BUFFERClick to expand / collapse

Summary

Profiling the embedded-run pipeline on a Mac mini M4 (24GB RAM, openclaw 2026.5.7, primary google/gemini-3.1-pro-preview via Vertex SA) shows that ~14 of every ~43 seconds of time-to-first-token is spent on work that doesn't change between requests:

StageAvgWhat it does
auth5.5sVertex SA token resolution per request
attempt-dispatch5.4sFirst HTTP roundtrip (no keep-alive pool)
core-plugin-tools3.9sPlugin tool module loading per request
bundle-tools5.0sTool definition serialization per request
system-prompt10.7sMemory search + assembly
stream-setup10.4sSSE stream open + tool-call hooks
Total TTFT42.9s(38.6s best, 62.9s worst)

Network RTT to all relevant endpoints is 9-32ms — none of the 42s comes from raw network.

This issue covers the two stages where the work is most clearly redundant: auth and bundle-tools/core-plugin-tools.

Problem 1: auth re-resolves Vertex SA token every request (5.5s)

Evidence:

  • /tmp/google-sa-token.cache is rarely refreshed, suggesting the underlying token IS being cached somewhere
  • BUT the auth stage still takes 5.5s on every request, including back-to-back requests
  • Stage breakdown is consistent across 20+ runs (min 5.0s, max 8.9s)

The most likely cause is the google-auth-library-nodejs cache being instance-scopedcachedCredential lives on the GoogleAuth instance. If openclaw constructs a new GoogleAuth(...) per request, the underlying token cache is discarded each time. See issue #390: "getClient ignores credentials when cached".

Compounding this is issue #49: getRequestMetadata() historically re-signed JWTs on every call. Use getAccessToken() instead.

Proposal 1: Singleton GoogleAuth, eager-warmed at boot

// src/auth/google-singleton.ts
import { GoogleAuth, JWT } from 'google-auth-library';

const auth = new GoogleAuth({
  keyFilename: process.env.GOOGLE_APPLICATION_CREDENTIALS,
  scopes: ['https://www.googleapis.com/auth/cloud-platform'],
});

let clientPromise: Promise<JWT> | null = null;
export function getAuthClient(): Promise<JWT> {
  if (!clientPromise) {
    clientPromise = auth.getClient().then((c) => {
      (c as any).eagerRefreshThresholdMillis = 5 * 60 * 1000; // 5-min eager refresh
      return c as JWT;
    });
  }
  return clientPromise;
}

export async function getBearerToken(): Promise<string> {
  const client = await getAuthClient();
  const { token } = await client.getAccessToken();
  if (!token) throw new Error('no token');
  return token;
}

Pre-warm at gateway boot so first request doesn't pay the JWT-sign cost.

Expected impact: 5.5s → ~5ms for cache hits (≥99% of requests after warmup).

Problem 2: Tool bundling re-runs static work every request (8.9s)

core-plugin-tools (3.9s) + bundle-tools (5.0s) = 8.9s spent re-loading and re-serializing tool definitions that don't change between requests.

Likely contributors (in order of probability):

  1. Zod schema compilation per request — Zod v4 JIT-compiles via new Function(), 17x slower at creation than v3. With ~50 tools and nested schemas, rebuild seconds.
  2. require.cache invalidation for hot-reload — if the loader explicitly delete require.cache[...] to support plugin reload, every request re-parses every plugin file.
  3. Description templating — agent-name / time injection into tool descriptions, then re-stringify.

This is the same pattern other LLM gateways have moved away from:

  • LangChain.js uses WeakMap for Zod-conversion memoization (per their docs)
  • OpenAI Agents SDK exposes cache_tools_list=True + invalidate_tools_cache()"set to True only if you are confident that the tool definitions do not change frequently"
  • Vercel AI SDK discussions treat tool definitions as cacheable static prefixes

Proposal 2: Module-scope tool memoization with explicit invalidation hook

// src/tools/bundle.ts
import stableStringify from 'fast-json-stable-stringify';
import { allPlugins } from '../plugins';
import { createHash } from 'node:crypto';

let cache: { tools: any[]; json: string; hash: string } | null = null;

function build() {
  const tools = allPlugins.flatMap(p => p.tools.map(t => ({
    name: t.name,
    description: t.description,
    input_schema: t.jsonSchema,  // pre-converted from Zod once
  })));
  const json = stableStringify(tools);
  const hash = createHash('sha256').update(json).digest('hex');
  return { tools: Object.freeze(tools), json, hash };
}

export function getTools() {
  if (!cache) cache = build();
  return cache;
}

// SIGHUP-driven invalidation for plugin reload (off the request path)
export function invalidateTools() { cache = null; }
process.on('SIGHUP', invalidateTools);

Stable JSON output also unlocks Anthropic prompt caching (cache_control on the last tool entry → 90% input-token cost reduction on the tool prefix) and Vertex context caching. See Anthropic Prompt Caching docs"verify that the keys in your tool_use content blocks have stable ordering as some languages randomize key order during JSON conversion, breaking caches".

Expected impact: 8.9s → <50ms after first build.

Combined potential

If both proposals land, gateway TTFT drops from ~43s to ~28s on this workload — without functional changes. Plus a separate ~1s win from connection keep-alive (undici global dispatcher with allowH2) which I'm not raising here since it's a more invasive transport change.

Tradeoffs

  • Memoized tools breaks runtime hot-reload of plugin definitions. Hence the invalidateTools() SIGHUP hook — moves invalidation off the request path while preserving the feature for users who want it.
  • Singleton GoogleAuth needs to handle credential rotation. The 5-min eagerRefreshThresholdMillis keeps tokens fresh; revoked SAs can keep working for ~55min, which is the SDK's default behavior anyway.
  • Tool descriptions can no longer contain dynamic content (timestamps, agent name). If any built-in tool uses templated descriptions, those need to move to per-request system prompt instead of tool definitions — which is also where Anthropic recommends putting them for caching purposes.

Environment

  • macOS Sequoia, Apple M4 (10-core, 24GB unified RAM)
  • openclaw 2026.5.7
  • Node.js v22 (homebrew)
  • Primary model: google/gemini-3.1-pro-preview (Vertex SA via service-account file)
  • Heavy use case: Telegram bot, mem0+Qdrant memory search, ~50 plugin tools

Happy to test patches if anyone takes this on, or to open a PR myself if there's interest in the approach.

Sources

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix perf: per-request auth (5.5s) and tool bundling (8.9s) dominate gateway TTFT [1 comments, 2 participants]