openclaw - 💡(How to fix) Fix perf: per-request auth (5.5s) and tool bundling (8.9s) dominate gateway TTFT [1 comments, 2 participants]

openclaw2026-05-10 06:12:04

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80131•Fetched 2026-05-11 03:18:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

banna-commits

Participants

banna-commits

clawsweeper[bot]

Timeline (top)

commented ×1cross-referenced ×1

Profiling the embedded-run pipeline on a Mac mini M4 (24GB RAM, openclaw 2026.5.7, primary google/gemini-3.1-pro-preview via Vertex SA) shows that ~14 of every ~43 seconds of time-to-first-token is spent on work that doesn't change between requests:

Stage	Avg	What it does
`auth`	5.5s	Vertex SA token resolution per request
`attempt-dispatch`	5.4s	First HTTP roundtrip (no keep-alive pool)
`core-plugin-tools`	3.9s	Plugin tool module loading per request
`bundle-tools`	5.0s	Tool definition serialization per request
`system-prompt`	10.7s	Memory search + assembly
`stream-setup`	10.4s	SSE stream open + tool-call hooks
Total TTFT	42.9s	(38.6s best, 62.9s worst)

Network RTT to all relevant endpoints is 9-32ms — none of the 42s comes from raw network.

This issue covers the two stages where the work is most clearly redundant: auth and bundle-tools/core-plugin-tools.

Error Message

if (!token) throw new Error('no token');

Root Cause

Stage	Avg	What it does
`auth`	5.5s	Vertex SA token resolution per request
`attempt-dispatch`	5.4s	First HTTP roundtrip (no keep-alive pool)
`core-plugin-tools`	3.9s	Plugin tool module loading per request
`bundle-tools`	5.0s	Tool definition serialization per request
`system-prompt`	10.7s	Memory search + assembly
`stream-setup`	10.4s	SSE stream open + tool-call hooks
Total TTFT	42.9s	(38.6s best, 62.9s worst)

Network RTT to all relevant endpoints is 9-32ms — none of the 42s comes from raw network.

This issue covers the two stages where the work is most clearly redundant: auth and bundle-tools/core-plugin-tools.

Fix Action

Fix / Workaround

Stage	Avg	What it does
`auth`	5.5s	Vertex SA token resolution per request
`attempt-dispatch`	5.4s	First HTTP roundtrip (no keep-alive pool)
`core-plugin-tools`	3.9s	Plugin tool module loading per request
`bundle-tools`	5.0s	Tool definition serialization per request
`system-prompt`	10.7s	Memory search + assembly
`stream-setup`	10.4s	SSE stream open + tool-call hooks
Total TTFT	42.9s	(38.6s best, 62.9s worst)

If both proposals land, gateway TTFT drops from ~43s to ~28s on this workload — without functional changes. Plus a separate ~1s win from connection keep-alive (undici global dispatcher with allowH2) which I'm not raising here since it's a more invasive transport change.

Happy to test patches if anyone takes this on, or to open a PR myself if there's interest in the approach.

Code Example

// src/auth/google-singleton.ts
import { GoogleAuth, JWT } from 'google-auth-library';

const auth = new GoogleAuth({
  keyFilename: process.env.GOOGLE_APPLICATION_CREDENTIALS,
  scopes: ['https://www.googleapis.com/auth/cloud-platform'],
});

let clientPromise: Promise<JWT> | null = null;
export function getAuthClient(): Promise<JWT> {
  if (!clientPromise) {
    clientPromise = auth.getClient().then((c) => {
      (c as any).eagerRefreshThresholdMillis = 5 * 60 * 1000; // 5-min eager refresh
      return c as JWT;
    });
  }
  return clientPromise;
}

export async function getBearerToken(): Promise<string> {
  const client = await getAuthClient();
  const { token } = await client.getAccessToken();
  if (!token) throw new Error('no token');
  return token;
}

---

// src/tools/bundle.ts
import stableStringify from 'fast-json-stable-stringify';
import { allPlugins } from '../plugins';
import { createHash } from 'node:crypto';

let cache: { tools: any[]; json: string; hash: string } | null = null;

function build() {
  const tools = allPlugins.flatMap(p => p.tools.map(t => ({
    name: t.name,
    description: t.description,
    input_schema: t.jsonSchema,  // pre-converted from Zod once
  })));
  const json = stableStringify(tools);
  const hash = createHash('sha256').update(json).digest('hex');
  return { tools: Object.freeze(tools), json, hash };
}

export function getTools() {
  if (!cache) cache = build();
  return cache;
}

// SIGHUP-driven invalidation for plugin reload (off the request path)
export function invalidateTools() { cache = null; }
process.on('SIGHUP', invalidateTools);

RAW_BUFFERClick to expand / collapse

Summary

Stage	Avg	What it does
`auth`	5.5s	Vertex SA token resolution per request
`attempt-dispatch`	5.4s	First HTTP roundtrip (no keep-alive pool)
`core-plugin-tools`	3.9s	Plugin tool module loading per request
`bundle-tools`	5.0s	Tool definition serialization per request
`system-prompt`	10.7s	Memory search + assembly
`stream-setup`	10.4s	SSE stream open + tool-call hooks
Total TTFT	42.9s	(38.6s best, 62.9s worst)

Network RTT to all relevant endpoints is 9-32ms — none of the 42s comes from raw network.

This issue covers the two stages where the work is most clearly redundant: auth and bundle-tools/core-plugin-tools.

Problem 1: `auth` re-resolves Vertex SA token every request (5.5s)

Evidence:

/tmp/google-sa-token.cache is rarely refreshed, suggesting the underlying token IS being cached somewhere
BUT the auth stage still takes 5.5s on every request, including back-to-back requests
Stage breakdown is consistent across 20+ runs (min 5.0s, max 8.9s)

The most likely cause is the google-auth-library-nodejs cache being instance-scoped — cachedCredential lives on the GoogleAuth instance. If openclaw constructs a new GoogleAuth(...) per request, the underlying token cache is discarded each time. See issue #390: "getClient ignores credentials when cached".

Compounding this is issue #49: getRequestMetadata() historically re-signed JWTs on every call. Use getAccessToken() instead.

Proposal 1: Singleton GoogleAuth, eager-warmed at boot

// src/auth/google-singleton.ts
import { GoogleAuth, JWT } from 'google-auth-library';

const auth = new GoogleAuth({
  keyFilename: process.env.GOOGLE_APPLICATION_CREDENTIALS,
  scopes: ['https://www.googleapis.com/auth/cloud-platform'],
});

let clientPromise: Promise<JWT> | null = null;
export function getAuthClient(): Promise<JWT> {
  if (!clientPromise) {
    clientPromise = auth.getClient().then((c) => {
      (c as any).eagerRefreshThresholdMillis = 5 * 60 * 1000; // 5-min eager refresh
      return c as JWT;
    });
  }
  return clientPromise;
}

export async function getBearerToken(): Promise<string> {
  const client = await getAuthClient();
  const { token } = await client.getAccessToken();
  if (!token) throw new Error('no token');
  return token;
}

Pre-warm at gateway boot so first request doesn't pay the JWT-sign cost.

Expected impact: 5.5s → ~5ms for cache hits (≥99% of requests after warmup).

Problem 2: Tool bundling re-runs static work every request (8.9s)

core-plugin-tools (3.9s) + bundle-tools (5.0s) = 8.9s spent re-loading and re-serializing tool definitions that don't change between requests.

Likely contributors (in order of probability):

Zod schema compilation per request — Zod v4 JIT-compiles via new Function(), 17x slower at creation than v3. With ~50 tools and nested schemas, rebuild seconds.
require.cache invalidation for hot-reload — if the loader explicitly delete require.cache[...] to support plugin reload, every request re-parses every plugin file.
Description templating — agent-name / time injection into tool descriptions, then re-stringify.

This is the same pattern other LLM gateways have moved away from:

LangChain.js uses WeakMap for Zod-conversion memoization (per their docs)
OpenAI Agents SDK exposes cache_tools_list=True + invalidate_tools_cache() — "set to True only if you are confident that the tool definitions do not change frequently"
Vercel AI SDK discussions treat tool definitions as cacheable static prefixes

Proposal 2: Module-scope tool memoization with explicit invalidation hook

// src/tools/bundle.ts
import stableStringify from 'fast-json-stable-stringify';
import { allPlugins } from '../plugins';
import { createHash } from 'node:crypto';

let cache: { tools: any[]; json: string; hash: string } | null = null;

function build() {
  const tools = allPlugins.flatMap(p => p.tools.map(t => ({
    name: t.name,
    description: t.description,
    input_schema: t.jsonSchema,  // pre-converted from Zod once
  })));
  const json = stableStringify(tools);
  const hash = createHash('sha256').update(json).digest('hex');
  return { tools: Object.freeze(tools), json, hash };
}

export function getTools() {
  if (!cache) cache = build();
  return cache;
}

// SIGHUP-driven invalidation for plugin reload (off the request path)
export function invalidateTools() { cache = null; }
process.on('SIGHUP', invalidateTools);

Stable JSON output also unlocks Anthropic prompt caching (cache_control on the last tool entry → 90% input-token cost reduction on the tool prefix) and Vertex context caching. See Anthropic Prompt Caching docs — "verify that the keys in your tool_use content blocks have stable ordering as some languages randomize key order during JSON conversion, breaking caches".

Expected impact: 8.9s → <50ms after first build.

Combined potential

Tradeoffs

Memoized tools breaks runtime hot-reload of plugin definitions. Hence the invalidateTools() SIGHUP hook — moves invalidation off the request path while preserving the feature for users who want it.
Singleton GoogleAuth needs to handle credential rotation. The 5-min eagerRefreshThresholdMillis keeps tokens fresh; revoked SAs can keep working for ~55min, which is the SDK's default behavior anyway.
Tool descriptions can no longer contain dynamic content (timestamps, agent name). If any built-in tool uses templated descriptions, those need to move to per-request system prompt instead of tool definitions — which is also where Anthropic recommends putting them for caching purposes.

Environment

macOS Sequoia, Apple M4 (10-core, 24GB unified RAM)
openclaw 2026.5.7
Node.js v22 (homebrew)
Primary model: google/gemini-3.1-pro-preview (Vertex SA via service-account file)
Heavy use case: Telegram bot, mem0+Qdrant memory search, ~50 plugin tools

Happy to test patches if anyone takes this on, or to open a PR myself if there's interest in the approach.

Sources

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #network issue #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix perf: per-request auth (5.5s) and tool bundling (8.9s) dominate gateway TTFT [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Problem 1: `auth` re-resolves Vertex SA token every request (5.5s)

Proposal 1: Singleton GoogleAuth, eager-warmed at boot

Problem 2: Tool bundling re-runs static work every request (8.9s)

Proposal 2: Module-scope tool memoization with explicit invalidation hook

Combined potential

Tradeoffs

Environment

Sources

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix perf: per-request auth (5.5s) and tool bundling (8.9s) dominate gateway TTFT [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Problem 1: auth re-resolves Vertex SA token every request (5.5s)

Proposal 1: Singleton GoogleAuth, eager-warmed at boot

Problem 2: Tool bundling re-runs static work every request (8.9s)

Proposal 2: Module-scope tool memoization with explicit invalidation hook

Combined potential

Tradeoffs

Environment

Sources

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Problem 1: `auth` re-resolves Vertex SA token every request (5.5s)