hermes - 💡(How to fix) Fix build-arm64 job deterministically fails on cold cache (Azure SAS token expires mid-build)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

ERROR: failed to build: failed to solve: failed to compute cache key: failed to copy: GET https://productionresultssa5.blob.core.windows.net/actions-cache/... RESPONSE 403: 403 Server failed to authenticate the request. ERROR CODE: AuthenticationFailed <AuthenticationErrorDetail>Signature not valid in the specified time frame: Start [05:53:56 GMT] - Expiry [06:04:01 GMT] - Current [06:05:11 GMT]</AuthenticationErrorDetail>

Root Cause

The build-arm64 job in .github/workflows/docker-publish.yml has been deterministically failing on PRs when the GHA cache is cold, because the Azure SAS token used to fetch/push cache blobs has a 10-minute lifetime and the cache-cold arm64 build takes ~22 minutes.

Fix Action

Fix / Workaround

Workaround until then

Code Example

ERROR: failed to build: failed to solve: failed to compute cache key: failed to copy:
  GET https://productionresultssa5.blob.core.windows.net/actions-cache/...
RESPONSE 403: 403 Server failed to authenticate the request.
ERROR CODE: AuthenticationFailed
<AuthenticationErrorDetail>Signature not valid in the specified time frame:
  Start [05:53:56 GMT] - Expiry [06:04:01 GMT] - Current [06:05:11 GMT]</AuthenticationErrorDetail>
RAW_BUFFERClick to expand / collapse

Problem

The build-arm64 job in .github/workflows/docker-publish.yml has been deterministically failing on PRs when the GHA cache is cold, because the Azure SAS token used to fetch/push cache blobs has a 10-minute lifetime and the cache-cold arm64 build takes ~22 minutes.

Symptom

ERROR: failed to build: failed to solve: failed to compute cache key: failed to copy:
  GET https://productionresultssa5.blob.core.windows.net/actions-cache/...
RESPONSE 403: 403 Server failed to authenticate the request.
ERROR CODE: AuthenticationFailed
<AuthenticationErrorDetail>Signature not valid in the specified time frame:
  Start [05:53:56 GMT] - Expiry [06:04:01 GMT] - Current [06:05:11 GMT]</AuthenticationErrorDetail>

Hit twice consecutively on PR #33675 (https://github.com/NousResearch/hermes-agent/actions/runs/26556260410/job/78228643594 and the rerun https://github.com/NousResearch/hermes-agent/actions/runs/26556260410/job/78231013284). The amd64 build passes — only arm64 hits the timeout because it's emulated on the ARM runner with a colder cache scope.

Why it loops

Each retry suffers the same cache-miss → 22-min build → SAS expiry → cache push also fails. So the next run is still cold, and we never warm the cache.

Options to fix (pick one or combine)

  1. Drop cache-to/cache-from for the arm64 PR build (lines 212-213 of docker-publish.yml). PR builds become slower but reliable; main-branch builds still get cached because they go through the publish path.

  2. Use a different cache backend (registry cache, S3-backed cache, or local type=local mounted via actions/cache). These don't have the SAS token expiry. Registry cache is the standard fix for this exact problem.

  3. Mark build-arm64 as continue-on-error: true for PRs so a flaky arm64 build doesn't block merge. Smoke testing still happens on push-to-main.

  4. Split the build into smaller layers so each cache pull/push fits inside the 10-min SAS window. Brittle and dependent on Dockerfile structure.

Registry cache (option 2) is the cleanest long-term fix.

Workaround until then

build-arm64 is not a required check on main, so PRs can be force-merged with gh pr merge --admin when this fires (verified with PR #33675).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING