openclaw - 💡(How to fix) Fix Auth event log: append-only structured logging for auth decisions [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59277Fetched 2026-04-08 02:26:32
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
closed ×1commented ×1locked ×1
  • Current auth state storage: ~/.openclaw/agents/{agent}/agent/auth-profiles.jsonusageStats
  • Only stores: lastUsed, errorCount, lastFailureAt, failureCounts, cooldownUntil, disabledUntil, disabledReason
  • No historical data - each field is overwritten, not appended
  • Auth fallback decisions happen in the gateway request lifecycle

Error Message

3. Error context on LLM requests

When an LLM request fails, the error should include which auth lane was attempted and why it failed - not just the generic timeout/failure. This lets operators correlate specialist failures to auth issues.

Root Cause

We downgraded a Claude subscription from $200 to $100. The OAuth token was invalidated but the system silently fell back to a secondary auth lane. We had no notification, no log trail, and no way to know how long the primary lane had been dead. When specialists started failing intermittently, we couldn't correlate those failures to auth lane issues because there was no connection between FailoverError: LLM request timed out and the underlying auth state.

Code Example

{"ts":"2026-04-01T15:31:00Z","event":"auth_failure","profile":"anthropic:oauth-mc","reason":"auth_permanent","model":"claude-opus-4-6","session":"agent:main:main"}
{"ts":"2026-04-01T15:31:00Z","event":"lane_fallback","from":"anthropic:oauth-mc","to":"anthropic:oauth-bc","reason":"auth_permanent"}
{"ts":"2026-04-01T15:31:00Z","event":"lane_disabled","profile":"anthropic:oauth-mc","reason":"auth_permanent","until":"2026-04-01T20:31:00Z"}
{"ts":"2026-04-01T17:36:00Z","event":"lane_restored","profile":"anthropic:oauth-mc","method":"manual_token_update"}

---

openclaw auth log                    # recent auth events
openclaw auth log --profile oauth-mc # filter by lane
openclaw auth log --failures         # failures only
openclaw auth log --since 24h        # time range
RAW_BUFFERClick to expand / collapse

Problem

OpenClaw's auth system has no historical logging. When an auth lane fails and the system falls back to another lane, the only trace is a single lastFailureAt timestamp in auth-profiles.json that gets overwritten on each failure. There's no append log, no fallback event trail, and no way to reconstruct auth behavior over time.

What happened

We downgraded a Claude subscription from $200 to $100. The OAuth token was invalidated but the system silently fell back to a secondary auth lane. We had no notification, no log trail, and no way to know how long the primary lane had been dead. When specialists started failing intermittently, we couldn't correlate those failures to auth lane issues because there was no connection between FailoverError: LLM request timed out and the underlying auth state.

Diagnosing required manually parsing JSON timestamps from auth-profiles.json - archaeology, not observability.

Impact

  • Silent auth lane degradation went undetected for ~24 hours
  • Specialist failures couldn't be attributed to auth vs capacity vs bugs
  • No way to retrospectively audit auth health over a time range
  • Operators can't answer: "when did lane X start failing and what happened?"

Proposed Solution

1. Append-only auth event log

A structured, timestamped log file (auth-events.jsonl or similar) that captures every auth-relevant event:

{"ts":"2026-04-01T15:31:00Z","event":"auth_failure","profile":"anthropic:oauth-mc","reason":"auth_permanent","model":"claude-opus-4-6","session":"agent:main:main"}
{"ts":"2026-04-01T15:31:00Z","event":"lane_fallback","from":"anthropic:oauth-mc","to":"anthropic:oauth-bc","reason":"auth_permanent"}
{"ts":"2026-04-01T15:31:00Z","event":"lane_disabled","profile":"anthropic:oauth-mc","reason":"auth_permanent","until":"2026-04-01T20:31:00Z"}
{"ts":"2026-04-01T17:36:00Z","event":"lane_restored","profile":"anthropic:oauth-mc","method":"manual_token_update"}

2. Lane switch events

When the system falls back from one auth lane to another, log it as a discrete event with the reason. Not silent fallback.

3. Error context on LLM requests

When an LLM request fails, the error should include which auth lane was attempted and why it failed - not just the generic timeout/failure. This lets operators correlate specialist failures to auth issues.

4. CLI query support (nice-to-have)

openclaw auth log                    # recent auth events
openclaw auth log --profile oauth-mc # filter by lane
openclaw auth log --failures         # failures only
openclaw auth log --since 24h        # time range

Context

  • Current auth state storage: ~/.openclaw/agents/{agent}/agent/auth-profiles.jsonusageStats
  • Only stores: lastUsed, errorCount, lastFailureAt, failureCounts, cooldownUntil, disabledUntil, disabledReason
  • No historical data - each field is overwritten, not appended
  • Auth fallback decisions happen in the gateway request lifecycle

extent analysis

TL;DR

Implement an append-only auth event log to capture every auth-relevant event, including auth failures, lane fallbacks, and lane restorations.

Guidance

  • Introduce a new log file, such as auth-events.jsonl, to store timestamped auth events, allowing for historical tracking and auditing of auth health.
  • Modify the auth system to log discrete events when the system falls back from one auth lane to another, including the reason for the fallback.
  • Enhance error messages for LLM requests to include the attempted auth lane and failure reason, enabling operators to correlate specialist failures with auth issues.
  • Consider adding CLI query support for easy access to auth event logs, such as filtering by lane, failures, or time range.

Example

{"ts":"2026-04-01T15:31:00Z","event":"auth_failure","profile":"anthropic:oauth-mc","reason":"auth_permanent","model":"claude-opus-4-6","session":"agent:main:main"}

Notes

The proposed solution focuses on introducing an append-only log to capture historical auth data, which will help diagnose and audit auth issues. However, the implementation details, such as log rotation and storage, are not specified.

Recommendation

Apply the proposed solution to implement an append-only auth event log, as it will provide the necessary visibility into auth-related events and enable operators to diagnose and resolve issues more effectively.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Auth event log: append-only structured logging for auth decisions [1 comments, 1 participants]