openclaw - 💡(How to fix) Fix [Bug]: Subagent spawn fails with "No API key found for bedrock" when using IAM Roles Anywhere (AWS_PROFILE + credential_process) [1 comments, 2 participants]

openclaw2026-04-26 20:06:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72349•Fetched 2026-04-27 05:31:10

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sudheermanubolu

Participants

clawsweeper[bot]

sudheermanubolu

Timeline (top)

labeled ×2closed ×1commented ×1

sessions_spawn subagent runs to Bedrock-backed agents fail with FailoverError: No API key found for bedrock even though the same agent, same model, and same AWS credentials work perfectly from the interactive (main session) path. This is a close sibling to #53928 but manifests on a different surface (subagent dispatch) and a different auth mechanism (IAM Roles Anywhere instead of EC2 instance role).

The AWS SDK credential chain resolves IAM Roles Anywhere credentials correctly at the host level (verified: the configured credential_process returns valid, short-lived credentials on demand), and the main agent session successfully calls Bedrock using those credentials. Only subagent dispatches fail the pre-flight auth check.

Error Message

lane=subagent durationMs=8712 error=FailoverError: No API key found for bedrock... lane=session:agent:main:subagent:f1002784-fd59-4b7e-bb1d-09264628e903 durationMs=8715 ...

OpenClaw version

2026.4.21 (current latest, released 2026-04-21). Also reproduces on 2026.4.14 (previous version).

Operating system

Rocky Linux 9.7 host, running the OpenClaw Docker container (Debian 12 / Node 24.14.0 inside).

Install method

docker compose

Model

Primary: bedrock/global.anthropic.claude-opus-4-7 Fallback 1: bedrock/global.anthropic.claude-opus-4-6-v1 Fallback 2: bedrock/global.anthropic.claude-sonnet-4-6 All three fail with identical error on subagent dispatch. All three work fine on main session.

Provider / routing chain

Amazon Bedrock

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Root Cause

Likely root cause

Fix Action

Fix / Workaround

Subagent dispatched Bedrock calls should use the same AWS credential chain as the main interactive session. If the main session can authenticate, the subagent's fresh pi-coding-agent instance should too.

Code Example

{
  "event": "embedded_run_failover_decision",
  "tags": ["error_handling", "failover", "prompt", "fallback_model"],
  "runId": "47b07189-531d-4ada-8c8e-96ddc37913c7",
  "stage": "prompt",
  "decision": "fallback_model",
  "failoverReason": "auth",
  "profileFailureReason": "auth",
  "provider": "amazon-bedrock",
  "model": "global.anthropic.claude-opus-4-7",
  "sourceProvider": "amazon-bedrock",
  "sourceModel": "global.anthropic.claude-opus-4-7",
  "fallbackConfigured": true,
  "aborted": false,
  "status": 401,
  "rawErrorPreview": "No API key found for bedrock.\n\nUse /login or set an API key environment variable. See /app/node_modules/@mariozechner/pi-coding-agent/docs/providers.md",
  "rawErrorHash": "sha256:220bb17d0061",
  "providerRuntimeFailureKind": "unknown"
}

---

lane=subagent durationMs=8712 error=FailoverError: No API key found for bedrock...
lane=session:agent:main:subagent:f1002784-fd59-4b7e-bb1d-09264628e903 durationMs=8715 ...

### OpenClaw version

2026.4.21 (current latest, released 2026-04-21).  Also reproduces on 2026.4.14 (previous version).

### Operating system

Rocky Linux 9.7 host, running the OpenClaw Docker container (Debian 12 / Node 24.14.0 inside).

### Install method

docker compose

### Model

Primary: bedrock/global.anthropic.claude-opus-4-7 Fallback 1: bedrock/global.anthropic.claude-opus-4-6-v1 Fallback 2: bedrock/global.anthropic.claude-sonnet-4-6 All three fail with identical error on subagent dispatch. All three work fine on main session.

### Provider / routing chain

Amazon Bedrock

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

---

### Impact and severity

_No response_

### Additional information

### Auth setup

**IAM Roles Anywhere** (not to be confused with EC2 instance role).

- Auth mechanism: `credential_process` in AWS config file invokes the `aws_signing_helper` binary
- `aws_signing_helper` uses an X.509 certificate + private key to obtain temporary AWS credentials from IAM Roles Anywhere
- Host-level: credentials are always short-lived (~1 hour TTL) and refreshed on demand
- Container env: `AWS_CONFIG_FILE`, `AWS_SDK_LOAD_CONFIG=1`, `AWS_PROFILE=openclaw-bot`
- No static AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY anywhere

### Evidence the underlying auth works

Running the credential_process manually from inside the container (same user, same environment as OpenClaw):

---

Returns valid, short-lived credentials every time.

Main session at the same moment successfully calls Bedrock using these credentials and returns expected responses. The subagent path fails with identical configuration.

### Config (relevant section)

---

### Likely root cause

Based on reading through `@mariozechner/pi-coding-agent` and `@mariozechner/pi-ai` source in OpenClaw's `node_modules`:

1. The error message originates from `pi-coding-agent`'s `hasConfiguredAuth(model)` pre-flight check in `core/model-registry.js` and `core/agent-session.js`.

2. `hasConfiguredAuth()` returns true only if `authStorage.hasAuth(provider)` is true OR `providerRequestConfigs.get(provider)?.apiKey` is set.

3. Neither condition is true for `amazon-bedrock` when using AWS SDK credential chain — OpenClaw's own `resolveEnvApiKey` DOES recognize `AWS_PROFILE` as a valid auth marker (`AWS_SDK_ENV_MARKERS` in `model-auth-markers-*.js`), but this recognition doesn't appear to propagate into `pi-coding-agent`'s internal `providerRequestConfigs` for subagent spawns.

4. **Why the main session works:** Something in the main agent bootstrap path pre-populates or bypasses the `hasConfiguredAuth()` check. This pre-configuration does not survive into the subagent's fresh `pi-coding-agent` instance.

5. **Why the actual Bedrock call would succeed if we got past the pre-flight:** `pi-ai`'s `amazon-bedrock.js` correctly reads `AWS_PROFILE` and uses the AWS SDK credential chain (including `credential_process`). The bug is purely in the upstream pre-flight check, not in the Bedrock call itself.

### Impact and severity

- **Affected:** All OpenClaw users on Bedrock using IAM Roles Anywhere (or any other AWS SDK credential provider that isn't a static access key) when subagents are spawned
- **Severity:** High for users who rely on subagent-based research, parallelism, or delegation patterns
- **Workaround:** Restart the OpenClaw container. This clears the broken state for a period of time, but the bug returns when subagents are next spawned (timing varies — sometimes immediately, sometimes after hours of uptime)

Note: This makes subagent use essentially unreliable for IAM Roles Anywhere + Bedrock users. Every subagent spawn is a coin flip until the next restart.

### Workaround details

RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

Steps to reproduce

Configure OpenClaw with amazon-bedrock provider using IAM Roles Anywhere:
- AWS_CONFIG_FILE=/home/node/auth/config
- AWS_PROFILE=openclaw-bot
- Profile configured with credential_process = /path/to/aws_signing_helper credential-process --certificate ... --private-key ... --trust-anchor-arn ... --profile-arn ... --role-arn ...
Send a Telegram message to the main agent → Bedrock responds successfully (confirms credential chain works).
From the main session, trigger any work that causes sessions_spawn with runtime: "subagent" (e.g. ask the agent to do research, or any task the agent decides to delegate).
Subagent run fails immediately with FailoverError: No API key found for bedrock on every configured fallback model in sequence.
After ~3 model-fallback attempts, the subagent gives up with "Subagent announce give up (retry-limit)".

Expected behavior

Actual behavior

Main session works. Subagent fails every fallback in the chain with status 401 and the error "No API key found for bedrock". Full failover decision event from the gateway log:

{
  "event": "embedded_run_failover_decision",
  "tags": ["error_handling", "failover", "prompt", "fallback_model"],
  "runId": "47b07189-531d-4ada-8c8e-96ddc37913c7",
  "stage": "prompt",
  "decision": "fallback_model",
  "failoverReason": "auth",
  "profileFailureReason": "auth",
  "provider": "amazon-bedrock",
  "model": "global.anthropic.claude-opus-4-7",
  "sourceProvider": "amazon-bedrock",
  "sourceModel": "global.anthropic.claude-opus-4-7",
  "fallbackConfigured": true,
  "aborted": false,
  "status": 401,
  "rawErrorPreview": "No API key found for bedrock.\n\nUse /login or set an API key environment variable. See /app/node_modules/@mariozechner/pi-coding-agent/docs/providers.md",
  "rawErrorHash": "sha256:220bb17d0061",
  "providerRuntimeFailureKind": "unknown"
}

Sequence repeats for each fallback model (opus-4-7 → opus-4-6-v1 → sonnet-4-6), all with the same error and same hash sha256:220bb17d0061.

The lane information is consistent with a subagent dispatch:

lane=subagent durationMs=8712 error=FailoverError: No API key found for bedrock...
lane=session:agent:main:subagent:f1002784-fd59-4b7e-bb1d-09264628e903 durationMs=8715 ...

### OpenClaw version

2026.4.21 (current latest, released 2026-04-21).  Also reproduces on 2026.4.14 (previous version).

### Operating system

Rocky Linux 9.7 host, running the OpenClaw Docker container (Debian 12 / Node 24.14.0 inside).

### Install method

docker compose

### Model

Primary: bedrock/global.anthropic.claude-opus-4-7 Fallback 1: bedrock/global.anthropic.claude-opus-4-6-v1 Fallback 2: bedrock/global.anthropic.claude-sonnet-4-6 All three fail with identical error on subagent dispatch. All three work fine on main session.

### Provider / routing chain

Amazon Bedrock

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

Auth setup

IAM Roles Anywhere (not to be confused with EC2 instance role).

Auth mechanism: credential_process in AWS config file invokes the aws_signing_helper binary
aws_signing_helper uses an X.509 certificate + private key to obtain temporary AWS credentials from IAM Roles Anywhere
Host-level: credentials are always short-lived (~1 hour TTL) and refreshed on demand
Container env: AWS_CONFIG_FILE, AWS_SDK_LOAD_CONFIG=1, AWS_PROFILE=openclaw-bot
No static AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY anywhere

Evidence the underlying auth works

Running the credential_process manually from inside the container (same user, same environment as OpenClaw):

$ /home/node/auth/aws_signing_helper credential-process \
    --certificate /home/node/auth/bot.pem \
    --private-key /home/node/auth/bot.key \
    --trust-anchor-arn arn:aws:rolesanywhere:us-west-2:...:trust-anchor/... \
    --profile-arn arn:aws:rolesanywhere:us-west-2:...:profile/... \
    --role-arn arn:aws:iam::...:role/chatbot-anywhere-role
{
  "Version": 1,
  "AccessKeyId": "ASIA4E...",
  "SecretAccessKey": "...",
  "SessionToken": "...",
  "Expiration": "2026-04-25T17:40:17Z"
}

Returns valid, short-lived credentials every time.

Main session at the same moment successfully calls Bedrock using these credentials and returns expected responses. The subagent path fails with identical configuration.

Config (relevant section)

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "bedrock/global.anthropic.claude-opus-4-7",
        "fallbacks": [
          "bedrock/global.anthropic.claude-opus-4-6-v1",
          "bedrock/global.anthropic.claude-sonnet-4-6"
        ]
      },
      "models": {
        "bedrock/global.anthropic.claude-opus-4-7": {
          "alias": "claude-opus",
          "params": { "cacheRetention": "long", "maxTokens": 32000 }
        },
        "bedrock/global.anthropic.claude-opus-4-6-v1": {
          "alias": "claude-opus",
          "params": { "cacheRetention": "long", "maxTokens": 32000 }
        },
        "bedrock/global.anthropic.claude-sonnet-4-6": {
          "alias": "claude-sonnet",
          "params": { "cacheRetention": "long", "maxTokens": 32000 }
        }
      }
    }
  },
  "plugins": {
    "entries": {
      "amazon-bedrock": { "enabled": true }
    }
  }
}

Likely root cause

Based on reading through @mariozechner/pi-coding-agent and @mariozechner/pi-ai source in OpenClaw's node_modules:

The error message originates from pi-coding-agent's hasConfiguredAuth(model) pre-flight check in core/model-registry.js and core/agent-session.js.
hasConfiguredAuth() returns true only if authStorage.hasAuth(provider) is true OR providerRequestConfigs.get(provider)?.apiKey is set.
Neither condition is true for amazon-bedrock when using AWS SDK credential chain — OpenClaw's own resolveEnvApiKey DOES recognize AWS_PROFILE as a valid auth marker (AWS_SDK_ENV_MARKERS in model-auth-markers-*.js), but this recognition doesn't appear to propagate into pi-coding-agent's internal providerRequestConfigs for subagent spawns.
Why the main session works: Something in the main agent bootstrap path pre-populates or bypasses the hasConfiguredAuth() check. This pre-configuration does not survive into the subagent's fresh pi-coding-agent instance.
Why the actual Bedrock call would succeed if we got past the pre-flight: pi-ai's amazon-bedrock.js correctly reads AWS_PROFILE and uses the AWS SDK credential chain (including credential_process). The bug is purely in the upstream pre-flight check, not in the Bedrock call itself.

Impact and severity

Affected: All OpenClaw users on Bedrock using IAM Roles Anywhere (or any other AWS SDK credential provider that isn't a static access key) when subagents are spawned
Severity: High for users who rely on subagent-based research, parallelism, or delegation patterns
Workaround: Restart the OpenClaw container. This clears the broken state for a period of time, but the bug returns when subagents are next spawned (timing varies — sometimes immediately, sometimes after hours of uptime)

Note: This makes subagent use essentially unreliable for IAM Roles Anywhere + Bedrock users. Every subagent spawn is a coin flip until the next restart.

Workaround details

docker compose restart

~15 seconds of downtime. Fresh process state resets the auth pre-flight cache. Main session continues to work until next subagent spawn; failure eventually recurs.

Related issues

#53928 — Same root cause (embedded agent runner doesn't inherit AWS SDK credential chain), different surface (POST /hooks/agent vs subagent dispatch), different auth method (EC2 instance role vs IAM Roles Anywhere). The fix for one will likely fix the other.
#30215 — Feature request to support Bedrock Bearer Token auth explicitly. Relevant because users have been proposing this as a workaround for exactly the kind of AWS SDK chain resolution issues this bug exhibits.
aws-samples/sample-OpenClaw-on-AWS-with-Bedrock#64 — User-reported instance of the same error message on main-session path after a version upgrade. Suggests the main-session code path also has a related regression on some versions, though distinct from the subagent-specific surface in this report.

Additional context

Log evidence suggests the failure occurs specifically when spawning a fresh pi-coding-agent instance for a new subagent session. The first subagent dispatch after gateway startup may succeed; subsequent dispatches fail. Docker restart resets the failure state consistently.

The error hash (sha256:220bb17d0061) is identical across all three fallback models and across all runs, which strongly suggests a deterministic path mismatch rather than a transient AWS SDK credential resolution issue.

I would be willing to contribute a fix if the maintainers can confirm the approach — my reading of the code suggests the fix is either:

Have pi-coding-agent's hasConfiguredAuth() check for AWS_SDK_ENV_MARKERS (including AWS_PROFILE) before declaring "no API key", OR
Have OpenClaw pre-register a synthetic provider request config entry for amazon-bedrock at subagent spawn time when AWS SDK markers are present, so the pre-flight check finds something.

Happy to draft a PR if a maintainer can confirm which approach aligns with project direction.

extent analysis

TL;DR

The most likely fix involves modifying the hasConfiguredAuth() function in pi-coding-agent to recognize AWS SDK environment markers, such as AWS_PROFILE, for the amazon-bedrock provider.

Guidance

Investigate the hasConfiguredAuth() function in pi-coding-agent to understand why it doesn't recognize the AWS SDK credential chain for subagent spawns.
Consider adding a check for AWS_SDK_ENV_MARKERS (including AWS_PROFILE) in the hasConfiguredAuth() function to fix the issue.
Alternatively, explore pre-registering a synthetic provider request config entry for amazon-bedrock at subagent spawn time when AWS SDK markers are present.
Verify that the fix works by testing subagent dispatches with the modified code and checking for the absence of the FailoverError: No API key found for bedrock error.

Example

No code example is provided due to the complexity of the issue and the need for a thorough understanding of the pi-coding-agent and pi-ai codebases.

Notes

The fix may require changes to the pi-coding-agent codebase, and it's essential to ensure that the solution aligns with the project's direction and doesn't introduce any regressions.

Recommendation

Apply a workaround by restarting the OpenClaw container using docker compose restart until a permanent fix is implemented. This will reset the broken state and allow subagent spawns to work temporarily.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #environment variable #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: Subagent spawn fails with "No API key found for bedrock" when using IAM Roles Anywhere (AWS_PROFILE + credential_process) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Root Cause

Likely root cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

Impact and severity

Additional information

Auth setup

Evidence the underlying auth works

Config (relevant section)

Likely root cause

Impact and severity

Workaround details

Related issues

Additional context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING