nextjs - 💡(How to fix) Fix Server Actions fail during deployment rollouts on non-Vercel platforms (Version Skew / server-reference-manifest.json issue) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vercel/next.js#84667Fetched 2026-04-08 02:18:49
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×3closed ×1commented ×1issue_type_added ×1

Error Message

  1. Alternatively: Provide graceful error handling with automatic retry logic
  • Better error handling/messaging?
  1. Graceful degradation with user-friendly error recovery

Root Cause

Current Behavior: During deployment rollouts, Server Actions fail when client and server versions don't match. Users who loaded the page on version N encounter errors when version N+1 is deployed, because:

  • The client has action IDs from version N's server-reference-manifest.json
  • The new server (version N+1) only has the new manifest
  • Action ID lookups fail, resulting in user-facing errors

Code Example

Operating System:
    Platform: linux
    Arch: x64
    Version: (Cloud Run container)
  Binaries:
    Node: 20.17.0
    npm: 10.8.2
    pnpm: 9.10.0
  Relevant Packages:
    next: 14.2.32
    react: 18.3.1
    react-dom: 18.3.1
    typescript: 5.6.2
  Next.js Config:
    output: standalone
  Note: Run next info in your bluestone repo to get the exact versions
RAW_BUFFERClick to expand / collapse

Link to the code that reproduces this issue

https://github.com/anzx/bluestone Note: Since this is a private repository and the issue occurs in production during deployments, you may need to create a minimal public reproduction. For now, you can reference your repo and explain in "Additional context" that reproduction requires a rolling deployment scenario.

To Reproduce

  1. Deploy a Next.js application with Server Actions to a non-Vercel platform (e.g., GCP Cloud Run, AWS ECS, or any platform with rolling deployments)
  2. Configure rolling deployment strategy where multiple app versions run concurrently during rollout
  3. Have users actively interact with Server Actions (e.g., authentication flows, form submissions)
  4. Trigger a new deployment while users are using the application
  5. During the deployment rollout window, users will encounter errors:
    • "This page isn't working" errors
    • Server Action requests fail with action ID not found in server-reference-manifest.json
    • ConnectID authentication fails with 500/502 errors

Current vs. Expected behavior

Current Behavior: During deployment rollouts, Server Actions fail when client and server versions don't match. Users who loaded the page on version N encounter errors when version N+1 is deployed, because:

  • The client has action IDs from version N's server-reference-manifest.json
  • The new server (version N+1) only has the new manifest
  • Action ID lookups fail, resulting in user-facing errors

Screenshot evidence:

  • Users see "This page isn't working" errors during deployments
  • Particularly affects long-running Server Actions (ConnectID authentication)

Expected Behavior: Next.js should handle cross-version Server Action requests gracefully during stateless deployments:

  1. Ideally: Requests initiated on version N should complete successfully even after version N+1 is deployed
  2. Alternatively: Provide graceful error handling with automatic retry logic
  3. Or: Expose hooks/configuration to implement version routing at the application level

Since both Next.js and our application are stateless, Server Actions should be resilient to version changes during rolling deployments.

Impact:

  • Every deployment causes user-facing errors
  • High deployment velocity (CI/CD best practices) exacerbates the issue
  • Forces trade-off between deployment frequency and user experience

Provide environment information

Operating System:
    Platform: linux
    Arch: x64
    Version: (Cloud Run container)
  Binaries:
    Node: 20.17.0
    npm: 10.8.2
    pnpm: 9.10.0
  Relevant Packages:
    next: 14.2.32
    react: 18.3.1
    react-dom: 18.3.1
    typescript: 5.6.2
  Next.js Config:
    output: standalone
  Note: Run next info in your bluestone repo to get the exact versions

Which area(s) are affected? (Select all that apply)

Server Actions, Pages Router

Which stage(s) are affected? (Select all that apply)

next build (local), next start (local), Other (Deployed)

Additional context

Deployment Environment

  • Platform: Google Cloud Platform (Cloud Run)
  • Deployment Strategy: Rolling updates with container orchestration
  • Config: Using output: "standalone" for Docker deployments
  • Reproducibility: Issue ONLY occurs during deployment rollouts in production. Does not occur:
    • In local development
    • In deployments with single-instance (no rolling updates)
    • On Vercel platform (appears to be solved there)

Root Cause Hypothesis

Based on investigation by our team:

  1. Manifest Timing Issue (from @nicole.yu):

    • server-reference-manifest.json is generated at build time
    • There's a rollout gap where the new app revision goes live before the manifest file finishes uploading/propagating
    • Client requests with old action IDs hit the new server without the corresponding manifest entries
  2. Async Initialization Problem:

    • Initial code review suggests something isn't initialized in time (possibly async)
    • This timing issue specifically relates to new deployments
  3. No Cross-Version Request Handling:

    • Next.js doesn't appear to have built-in support for Server Action requests that span deployment versions

Exacerbating Factors

  • Frequent deployments: Modern CI/CD practices with multiple daily deployments
  • Long-running requests: Authentication flows (ConnectID) have longer request durations
  • Multiple concurrent Server Actions: Applications with many server actions see higher failure rates

Vercel Cloud Observation

This issue appears to be automatically handled on Vercel Cloud, suggesting there may be:

  • Infrastructure-level version routing
  • Manifest synchronization mechanisms
  • Sticky session handling

...that aren't available in self-hosted/GCP/AWS deployments.

Questions for Next.js Team

  1. Is there an official recommended strategy for handling Server Action version skew in non-Vercel deployments?
  2. Should server-reference-manifest.json be versioned or cached differently to prevent cross-version failures?
  3. Are there existing Next.js configuration options for graceful degradation during deployments?
  4. Would the Next.js team accept a PR that adds:
    • Retry logic for failed action lookups?
    • Better error handling/messaging?
    • Manifest versioning support?

Potential Solutions Under Consideration

  1. Client-side retry logic with exponential backoff
  2. Version routing at platform level (complex, requires infrastructure changes)
  3. Multi-version manifest caching during rollout periods
  4. Graceful degradation with user-friendly error recovery

Related Documentation


Note: We're experiencing this in production with real user impact. Happy to collaborate on a PR or provide more debugging information if needed. This affects anyone deploying Next.js with Server Actions to non-Vercel platforms using modern rolling deployment strategies.

extent analysis

TL;DR

Implement client-side retry logic with exponential backoff to handle Server Action version skew during rolling deployments.

Guidance

  1. Investigate manifest caching: Consider caching server-reference-manifest.json to prevent cross-version failures, potentially using a versioning system to ensure compatibility.
  2. Implement retry logic: Add client-side retry logic with exponential backoff to handle failed action lookups, reducing user-facing errors during deployments.
  3. Explore infrastructure-level solutions: Look into version routing at the platform level or using sticky session handling to mitigate the issue, although this may require significant infrastructure changes.
  4. Collaborate with Next.js team: Reach out to the Next.js team to discuss potential solutions, such as adding retry logic, better error handling, or manifest versioning support, and consider contributing a PR to address the issue.

Example

// Example of client-side retry logic with exponential backoff
const retryAction = async (actionId, maxRetries = 3, initialDelay = 500) => {
  let retries = 0;
  let delay = initialDelay;

  while (retries <= maxRetries) {
    try {
      // Attempt to perform the Server Action
      await performServerAction(actionId);
      return;
    } catch (error) {
      // If the action fails, wait for the specified delay before retrying
      await new Promise((resolve) => setTimeout(resolve, delay));
      retries++;
      delay *= 2; // Exponential backoff
    }
  }

  // If all retries fail, handle the error
  throw new Error(`Failed to perform Server Action after ${maxRetries} retries`);
};

Notes

The provided solution is a workaround to mitigate the issue, and a more permanent fix may require changes to the Next.js framework or the deployment infrastructure. The effectiveness of the retry logic and manifest caching will depend on the specific use case and deployment strategy.

Recommendation

Apply the workaround by implementing client-side retry logic with exponential backoff, as it can help reduce user-facing errors during deployments. This solution can be implemented immediately, while exploring more permanent fixes with the Next.js team or infrastructure changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

nextjs - 💡(How to fix) Fix Server Actions fail during deployment rollouts on non-Vercel platforms (Version Skew / server-reference-manifest.json issue) [1 comments, 2 participants]