openclaw - ✅(Solved) Fix [Bug]: Hundreds of repeated HTTP requests to 169.254.169.254:80 for the first 10 minutes after the startup [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64891Fetched 2026-04-12 13:26:21
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
cross-referenced ×2labeled ×2commented ×1referenced ×1

After upgrading OpenClaw in my NemoClaw environment from 2026.3.11 to 2026.4.9, NVIDIA's openshell sandbox started reporting GET and PUT requests to http://169.254.169.254/latest/api/token and http://169.254.169.254/latest/meta-data/iam/security-credentials/, which repeat every 5-6 seconds for exactly 10 minutes after startup, at which point they stop.

Error Message

  • The sandbox rejects by default HTTP requests to 169.254.169.254:80 with a 403 error.
  • It seems the code that is making those repeated requests stops doing when it experiences a timeout error, but keeps retrying for up to 10 minutes when it receives a 403 error

Root Cause

The repeated attempts to access an unauthorized IP and port are way too frequent and fill the sandbox logs with garbage entries instead of stopping after the first few attempts or at least doing an exponential backoff. There might be other side-effects too. It's not immediately clear why OpenClaw does this anyway and if there are other more serious issues that may be caused by this.

Fix Action

Fixed

PR fix notes

PR #64944: fix(memory): disable IMDS probing during AWS credential auto-detection [AI-assisted]

Description (problem / solution / changelog)

Summary

  • Problem: After upgrading from 2026.3.11 to 2026.4.9, hundreds of HTTP requests to 169.254.169.254 (AWS IMDS) are made every 5–6 seconds for ~10 minutes after startup in sandboxed environments (e.g. NVIDIA NemoClaw).
  • Why it matters: The requests spam sandbox logs, may trigger security alerts, and waste network resources in non-AWS environments.
  • What changed: hasAwsCredentials() now (1) respects AWS_EC2_METADATA_DISABLED as an early exit, and (2) temporarily disables IMDS probing during the defaultProvider call since EC2/ECS environments are already covered by explicit env var checks in CREDENTIAL_ENV_VARS.
  • What did NOT change (scope boundary): The credential detection logic for users with actual AWS credentials (env vars, ~/.aws/credentials, SSO, etc.) is unchanged. Only the IMDS network probe is suppressed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #64891
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Commit 699b2320a8 (feat(memory): add Bedrock embedding provider) added @aws-sdk/credential-provider-node as a new dependency. Before this change, loadCredentialProviderSdk() failed to import the SDK and returned null, so hasAwsCredentials() returned false immediately with zero network requests. After the change, the SDK loads successfully and defaultProvider() walks the full credential chain, eventually reaching fromInstanceMetadata() which probes IMDS at 169.254.169.254.
  • Missing detection / guardrail: No guard existed to skip IMDS probing when all explicit env var checks had already failed. The CREDENTIAL_ENV_VARS array covers EC2/ECS indicators (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, AWS_EC2_METADATA_SERVICE_ENDPOINT, etc.), so the defaultProvider fallback only needed file-based providers — but IMDS was still probed.
  • Contributing context (if known): In the reporter's NVIDIA NemoClaw environment, the sandbox intercepts IMDS requests and returns HTTP 403 (rather than a connection timeout). The AWS SDK treats 403 on the IMDSv2 token endpoint as "fall back to IMDSv1", generating 2 requests per process (PUT token + GET credentials). Since OpenClaw spawns multiple Node.js processes during startup, each independently calling hasAwsCredentials(), this produced hundreds of request pairs over ~10 minutes.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/memory-host-sdk/host/embeddings-bedrock.test.ts
  • Scenario the test should lock in: (1) hasAwsCredentials() returns false immediately when AWS_EC2_METADATA_DISABLED=true without invoking the SDK; (2) During the defaultProvider call, AWS_EC2_METADATA_DISABLED is set to "true" in the env object, and restored after the call completes.
  • Why this is the smallest reliable guardrail: The unit tests mock @aws-sdk/credential-provider-node and verify the env flag behavior without requiring actual IMDS endpoints or AWS credentials.
  • Existing test that already covers this (if any): Existing tests covered env var detection and SDK fallback, but did not test the IMDS disable behavior.
  • If no new test is added, why not: Two new tests were added.

User-visible / Behavior Changes

  • Users who set AWS_EC2_METADATA_DISABLED=true will now see hasAwsCredentials() return false immediately (previously, this env var was not checked at this level).
  • In non-AWS environments without any AWS env vars or config files, the startup IMDS probe is eliminated. This removes hundreds of failed HTTP requests in the first 10 minutes.
  • Users with legitimate AWS credentials via env vars, profiles, SSO, or container credentials are not affected.

Diagram (if applicable)

Before:
hasAwsCredentials() -> env var check (miss) -> defaultProvider() -> ... -> fromInstanceMetadata() -> PUT 169.254.169.254 -> 403 -> GET 169.254.169.254 -> 403

After:
hasAwsCredentials() -> env var check (miss) -> AWS_EC2_METADATA_DISABLED check -> set IMDS=disabled -> defaultProvider() -> ... -> remoteProvider() -> "IMDS disabled" -> skip -> fail -> return false

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? Yes — IMDS network calls are now suppressed during auto-detection
    • Risk: None. This removes unwanted network calls rather than adding new ones.
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Ubuntu 24.04.4 LTS (reporter's environment)
  • Runtime/container: NVIDIA NemoClaw v0.0.13 + openshell sandbox
  • Model/provider: nvidia/Nemotron-Mini-4B-Instruct via local vLLM
  • Relevant config (redacted): Default OpenClaw config with no AWS-specific settings

Steps

  1. Install OpenClaw 2026.4.9 in a sandboxed environment that blocks IMDS with 403
  2. Start OpenClaw with default configuration (no AWS env vars set)
  3. Observe sandbox logs for HTTP requests to 169.254.169.254

Expected

  • No HTTP requests to 169.254.169.254

Actual

  • Before fix: Hundreds of PUT/GET request pairs every 5–6 seconds for ~10 minutes
  • After fix: Zero IMDS requests (verified via unit tests that confirm IMDS is disabled during auto-detection)

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

All 29 tests pass, including 2 new tests:

  • returns false when AWS_EC2_METADATA_DISABLED is set — verifies early exit without SDK invocation
  • disables IMDS during defaultProvider call and restores env — verifies IMDS is disabled during the call and env is restored after

Human Verification (required)

  • Verified scenarios: Ran vitest run src/memory-host-sdk/host/embeddings-bedrock.test.ts — 29/29 tests pass. Verified that existing credential detection (access keys, profiles, ECS task role, EKS IRSA, SDK default chain) all still work correctly.
  • Edge cases checked: (1) AWS_EC2_METADATA_DISABLED already set to "false" — not treated as disabled. (2) AWS_EC2_METADATA_DISABLED already set to "true" by user — early exit, no SDK call. (3) env cleanup after defaultProvider throws — finally block restores env.
  • What you did not verify: Full E2E reproduction in NVIDIA NemoClaw sandbox environment (requires NVIDIA GPU + Docker + specific NemoClaw version).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No (new env var AWS_EC2_METADATA_DISABLED is optional and follows AWS SDK standard)
  • Migration needed? No

Risks and Mitigations

  • Risk: Users relying on IMDS-only credentials (EC2 instance role without any AWS_* env vars) for Bedrock embeddings would no longer be auto-detected by hasAwsCredentials().
    • Mitigation: EC2/ECS environments typically have AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_EC2_METADATA_SERVICE_ENDPOINT set, which are checked in CREDENTIAL_ENV_VARS before reaching the defaultProvider call. Users can also set explicit AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY or AWS_PROFILE.

Changed files

  • src/cli/gateway-cli/run.option-collisions.test.ts (modified, +1/-1)
  • src/cli/gateway-cli/run.ts (modified, +10/-3)
  • src/daemon/systemd-unit.test.ts (modified, +3/-0)
  • src/daemon/systemd-unit.ts (modified, +3/-0)
  • src/memory-host-sdk/host/embeddings-bedrock.test.ts (modified, +12/-0)
  • src/memory-host-sdk/host/embeddings-bedrock.ts (modified, +32/-0)
  • src/memory-host-sdk/host/embeddings.test.ts (modified, +2/-0)

Code Example

mkdir vllm-test ; cd vllm-test ; uv venv --python 3.12 --seed --managed-python
source .venv/bin/activate
uv pip install vllm --torch-backend=auto

---

export VLLM_VERSION=$(curl -s https://api.github.com/repos/vllm-project/vllm/releases/latest | jq -r .tag_name | sed 's/^v//')
export CUDA_VERSION=130
export CPU_ARCH=$(uname -m)
uv pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu${CUDA_VERSION}-cp38-abi3-manylinux_2_35_${CPU_ARCH}.whl --extra-index-url https://download.pytorch.org/whl/cu${CUDA_VERSION}

---

vllm serve "nvidia/Nemotron-Mini-4B-Instruct"

---

curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.13 bash

---

cd ~/.nemoclaw/source
docker build -f Dockerfile.base -t ghcr.io/nvidia/nemoclaw/sandbox-base:latest . --no-cache

---

NEMOCLAW_EXPERIMENTAL="1" nemoclaw onboard

---

openshell term

---

[1775865662.312] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(240) -> PUT http://169.254.169.254/latest/api/token [policy:-]
[1775865662.317] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(240) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:-]
[1775865668.120] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(312) -> PUT http://169.254.169.254/latest/api/token [policy:-]
[1775865668.125] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(312) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:-]
[1775865673.950] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(350) -> PUT http://169.254.169.254/latest/api/token [policy:-]
[1775865673.955] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(350) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:-]

---

$ curl --verbose --max-time 60 http://169.254.169.254/latest/meta-data/iam/security-credentials/
*   Trying 169.254.169.254:80...
* Connection timed out after 60003 milliseconds
* Closing connection
curl: (28) Connection timed out after 60003 milliseconds

---

$ curl --verbose --max-time 60 http://169.254.169.254/latest/meta-data/iam/security-credentials/
* Uses proxy env variable no_proxy == 'localhost,127.0.0.1,::1,10.200.0.1'
* Uses proxy env variable http_proxy == 'http://10.200.0.1:3128'
*   Trying 10.200.0.1:3128...
* Connected to 10.200.0.1 (10.200.0.1) port 3128 (#0)
> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ HTTP/1.1
> Host: 169.254.169.254
> User-Agent: curl/7.88.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 403 Forbidden
* no chunk, no close, no size. Assume close to signal end
<
* Closing connection 0

---

This textbox doesn't actually allow screenshots, so not sure how I could include them here, so I included them under "actual behavior" instead.
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After upgrading OpenClaw in my NemoClaw environment from 2026.3.11 to 2026.4.9, NVIDIA's openshell sandbox started reporting GET and PUT requests to http://169.254.169.254/latest/api/token and http://169.254.169.254/latest/meta-data/iam/security-credentials/, which repeat every 5-6 seconds for exactly 10 minutes after startup, at which point they stop.

Steps to reproduce

  • Install vLLM and NVIDIA's NemoClaw v0.0.13 on a Linux system with an NVIDIA GPU from a user that can run Docker commands, using for example:
mkdir vllm-test ; cd vllm-test ; uv venv --python 3.12 --seed --managed-python
source .venv/bin/activate
uv pip install vllm --torch-backend=auto
  • If the NVidia hardware requires CUDA 13.0, continue with:
export VLLM_VERSION=$(curl -s https://api.github.com/repos/vllm-project/vllm/releases/latest | jq -r .tag_name | sed 's/^v//')
export CUDA_VERSION=130
export CPU_ARCH=$(uname -m)
uv pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu${CUDA_VERSION}-cp38-abi3-manylinux_2_35_${CPU_ARCH}.whl --extra-index-url https://download.pytorch.org/whl/cu${CUDA_VERSION}

Start vLLM using a command such as

vllm serve "nvidia/Nemotron-Mini-4B-Instruct"
  • Wait for vllm to download the model and load it, then open another terminal and continue by installing NemoClaw v0.0.13, for example:
curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.13 bash
  • Reject the license by pressing [enter]
  • Edit ~/.nemoclaw/source/Dockerfile.base at line 151 changing the OpenClaw version from 2026.3.11 to 2026.4.9, save the file, then exit the editor.
  • Build the NemoClaw base image using a commands such as:
cd ~/.nemoclaw/source
docker build -f Dockerfile.base -t ghcr.io/nvidia/nemoclaw/sandbox-base:latest . --no-cache
  • Close and re-open the terminal, then continue the NemoClaw installation by running a command such as:
NEMOCLAW_EXPERIMENTAL="1" nemoclaw onboard
  • Accept the license by typing yes [enter]
  • After the onboarding passes step 2, open another terminal and run:
openshell term
  • After NVIDIA's openshell term starts, press [enter] to dismiss the greeting dialog
  • Return to the NemoClaw terminal, and press 9 [Enter] to select Local vLLM
  • Press y [Enter] to enable Brave Web Search
  • Enter a valid Brave Search API key then press [Enter]
  • Press 3 to enable Slack messaging then press [Enter]
  • Enter a valid Slack Bot Token then press [Enter]
  • Press [Enter] to accept the default sandbox name
  • After the sandbox is created, switch to the openterm term terminal and press [tab] twice then [enter] to select the new sandbox.
  • Press [l] (lowercase L) to view the logs
  • At this point the repeated requests to 169.254.169.254:80 can be seen in the logs
  • You can switch back to the NemoClaw terminal and press [Enter] to continue the installation
  • The failed attempts to access 169.254.169.254:80 every 5-6 seconds will persist for up to 10 minutes, then they will stop
  • Whitelisting 169.254.169.254:80 in openshell term will stop the hundreds of repeated attempts earlier
  • The issue cannot be reproduced with OpenClaw 2026.3.11 (by skipping the step where Dockerfile.base is edited to change it).
  • The sandbox rejects by default HTTP requests to 169.254.169.254:80 with a 403 error.
  • Whitelisting that IP and port in the sandbox will normally result in a timeout instead, unless that IP is accessible outside the sandbox
  • It seems the code that is making those repeated requests stops doing when it experiences a timeout error, but keeps retrying for up to 10 minutes when it receives a 403 error

Expected behavior

Ideally, there should be few or no requests to 169.254.169.254:80, not hundreds of requests every few seconds in the first 10 minutes after startup.

Actual behavior

Openshell's TUI showing the repeated requests:

<img width="1495" height="515" alt="Image" src="https://github.com/user-attachments/assets/d315be19-dc45-4252-b83d-eb9beff3060d" />

Partial openshell text log:

[1775865662.312] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(240) -> PUT http://169.254.169.254/latest/api/token [policy:-]
[1775865662.317] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(240) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:-]
[1775865668.120] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(312) -> PUT http://169.254.169.254/latest/api/token [policy:-]
[1775865668.125] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(312) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:-]
[1775865673.950] [sandbox] [OCSF ] [ocsf] HTTP:PUT [MED] DENIED /usr/local/bin/node(350) -> PUT http://169.254.169.254/latest/api/token [policy:-]
[1775865673.955] [sandbox] [OCSF ] [ocsf] HTTP:GET [MED] DENIED /usr/local/bin/node(350) -> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ [policy:-]

A graphical screenshot from the same log:

<img width="1901" height="1429" alt="Image" src="https://github.com/user-attachments/assets/319b13b2-7210-4695-afe5-8b7797f96034" />

The NVIDIA sandbox rejects such request with a 403 reply unless I explicitly authorize them. If I don't authorize them, it keeps making repeated requests to those 2 URLs every 5-6 seconds, for up to 10 minutes or until I authorize the IP at the sandbox level.

The 169.254.169.254 IP is not accessible even outside the sandbox, and attempting to do a curl to it from outside the sandbox results in a timeout even when I set the timeout to 60 seconds. So whatever is making those requests handles timeouts and stops retrying, but fails to handle 403 reject errors from the sandbox and keeps retrying for up to 10 minutes.

Running curl from outside the sandbox (or inside the sandbox with the IP whitelisted from openshell):

$ curl --verbose --max-time 60 http://169.254.169.254/latest/meta-data/iam/security-credentials/
*   Trying 169.254.169.254:80...
* Connection timed out after 60003 milliseconds
* Closing connection
curl: (28) Connection timed out after 60003 milliseconds

Running curl from inside the sandbox with the IP not whitelisted:

$ curl --verbose --max-time 60 http://169.254.169.254/latest/meta-data/iam/security-credentials/
* Uses proxy env variable no_proxy == 'localhost,127.0.0.1,::1,10.200.0.1'
* Uses proxy env variable http_proxy == 'http://10.200.0.1:3128'
*   Trying 10.200.0.1:3128...
* Connected to 10.200.0.1 (10.200.0.1) port 3128 (#0)
> GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ HTTP/1.1
> Host: 169.254.169.254
> User-Agent: curl/7.88.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 403 Forbidden
* no chunk, no close, no size. Assume close to signal end
<
* Closing connection 0

OpenClaw version

2026.4.9

Operating system

Ubuntu 24.04.4 LTS

Install method

npm install -g "[email protected]"

Model

nvidia/Nemotron-Mini-4B-Instruct

Provider / routing chain

openclaw -> NVIDIA openshell sandbox -> local vllm

Additional provider/model setup details

OpenClaw runs inside NVIDIA's openshell which rejects with 403 errors any HTTP requests to unauthorized IPs and hosts.

Logs, screenshots, and evidence

This textbox doesn't actually allow screenshots, so not sure how I could include them here, so I included them under "actual behavior" instead.

Impact and severity

The repeated attempts to access an unauthorized IP and port are way too frequent and fill the sandbox logs with garbage entries instead of stopping after the first few attempts or at least doing an exponential backoff. There might be other side-effects too. It's not immediately clear why OpenClaw does this anyway and if there are other more serious issues that may be caused by this.

Additional information

Last known good was 2026.3.11 and first known bad is 2026.4.9.

extent analysis

TL;DR

The issue can be mitigated by whitelisting the IP address 169.254.169.254 in the NVIDIA openshell sandbox, which will cause the repeated requests to stop after a timeout instead of continuing for up to 10 minutes with 403 errors.

Guidance

  • Identify the source of the repeated requests to 169.254.169.254 in the OpenClaw code to understand why these requests are being made and if they are necessary.
  • Consider implementing exponential backoff or a similar strategy to reduce the frequency of repeated requests in case of 403 errors.
  • Whitelist the IP address 169.254.169.254 in the NVIDIA openshell sandbox as a temporary workaround to stop the repeated requests.
  • Investigate why the 169.254.169.254 IP is not accessible outside the sandbox and if this is the expected behavior.

Example

No code snippet is provided as the issue is more related to configuration and networking.

Notes

The root cause of the issue seems to be related to the change in OpenClaw version from 2026.3.11 to 2026.4.9, but the exact reason for the repeated requests is still unclear. The workaround of whitelisting the IP address may not be a permanent solution and further investigation is needed to understand the underlying issue.

Recommendation

Apply the workaround of whitelisting the IP address 169.254.169.254 in the NVIDIA openshell sandbox to mitigate the issue, and continue investigating the root cause to find a permanent solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Ideally, there should be few or no requests to 169.254.169.254:80, not hundreds of requests every few seconds in the first 10 minutes after startup.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING