gemini-cli - 💡(How to fix) Fix Notify when Release: Nightly is stuck waiting for approval [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#25844Fetched 2026-04-23 07:44:26
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

Notify maintainers (Chat space, issue, or email) when the Release: Nightly workflow has a run that has been stuck in waiting (i.e., pending prod-environment manual approval by gemini-cli-askmode-approvers) for more than ~24 hours. Today, when nobody approves, runs silently pile up until they time out — masking real release-pipeline failures.

Error Message

The original #25507 was a one-shot file-and-close. If the next 7 nightly runs had updated the same issue with a recurrence count, the pattern would have been visible from a single notification. Suggested change to the existing Create Issue on Failure step in release-nightly.yml: search for an existing open release-failure issue with the same root-cause signature (e.g., grep last 100 lines of log for npm error code E\d+) and append a comment + bump priority instead of opening a fresh issue.

Root Cause

Seven consecutive nightlies sat in the approval queue without anyone noticing. The first one's auto-filed failure issue (#25507) was closed without action because there was no follow-up signal that the failure pattern was recurring.

Fix Action

Fix / Workaround

on: schedule: - cron: '0 12 * * *' # noon UTC, daily workflow_dispatch:

Code Example

name: 'Release: Nightly stuck watchdog'

on:
  schedule:
    - cron: '0 12 * * *'  # noon UTC, daily
  workflow_dispatch:

permissions:
  actions: 'read'
  issues: 'write'

jobs:
  check-stuck:
    if: "github.repository == 'google-gemini/gemini-cli'"
    runs-on: 'ubuntu-latest'
    steps:
      - uses: 'actions/github-script@…'
        env:
          GITHUB_TOKEN: '${{ secrets.GITHUB_TOKEN }}'
        with:
          script: |
            const cutoff = Date.now() - 24*60*60*1000; // 24h ago
            const runs = await github.paginate(github.rest.actions.listWorkflowRuns, {
              owner: context.repo.owner,
              repo: context.repo.repo,
              workflow_id: 'release-nightly.yml',
              status: 'waiting',
              per_page: 50,
            });
            const stuck = runs.filter(r => new Date(r.created_at).getTime() < cutoff);
            if (stuck.length === 0) {
              core.info('No stuck nightlies.');
              return;
            }
            const body = [
              '### ⚠️ Release: Nightly is stuck waiting for approval',
              '',
              `${stuck.length} nightly run(s) have been waiting on \`prod\` environment approval for >24 h.`,
              '',
              'This often hides a real, recurring publish failure (see post-mortem of #25507).',
              '',
              '| Created | Run |',
              '|---|---|',
              ...stuck.slice(0, 10).map(r => `| ${r.created_at} | [#${r.id}](${r.html_url}) |`),
              '',
              'Action required: either approve the most recent run, or investigate why approvers are not approving.',
              '',
              '<!-- nightly-stuck-watchdog -->',
            ].join('\n');

            // Upsert a single tracking issue so we don't spam.
            const { data: existing } = await github.rest.search.issuesAndPullRequests({
              q: `repo:${context.repo.owner}/${context.repo.repo} is:issue is:open in:body "<!-- nightly-stuck-watchdog -->"`,
            });
            if (existing.items.length > 0) {
              await github.rest.issues.update({
                owner: context.repo.owner, repo: context.repo.repo,
                issue_number: existing.items[0].number,
                body,
              });
            } else {
              await github.rest.issues.create({
                owner: context.repo.owner, repo: context.repo.repo,
                title: 'Release: Nightly is stuck waiting for approval',
                body,
                labels: ['release-failure', 'priority/p1', '🔒 maintainer only'],
                assignees: [],  // optionally: maintainers
              });
            }
RAW_BUFFERClick to expand / collapse

Summary

Notify maintainers (Chat space, issue, or email) when the Release: Nightly workflow has a run that has been stuck in waiting (i.e., pending prod-environment manual approval by gemini-cli-askmode-approvers) for more than ~24 hours. Today, when nobody approves, runs silently pile up until they time out — masking real release-pipeline failures.

Why is this needed?

Demonstrated failure mode (from the post-mortem of Release: Promote failure on 2026-04-23):

DateRunStatus
Apr 16First nightly with #25342❌ failed (E413) — auto-filed #25507, then closed without action
Apr 17nightly⏸️ waiting (no approval)
Apr 18nightly⏸️ waiting
Apr 19nightly⏸️ waiting
Apr 20nightly⏸️ waiting
Apr 21nightly⏸️ waiting
Apr 22nightly⏸️ waiting
Apr 23nightly⏸️ waiting
Apr 23manual Release: Promote❌ surfaced the same E413 to a human, finally

Seven consecutive nightlies sat in the approval queue without anyone noticing. The first one's auto-filed failure issue (#25507) was closed without action because there was no follow-up signal that the failure pattern was recurring.

A nightly-stuck notifier would have:

  1. Pinged @gemini-cli-askmode-approvers after ~24 h to either approve or investigate the stuck queue.
  2. Surfaced the recurrence (7 days × stuck) as a single high-signal alert rather than 7 silent timeouts.
  3. Plausibly led to discovery of the E413 root cause days earlier.

Proposed plan

A new scheduled workflow .github/workflows/release-nightly-stuck-watchdog.yml:

name: 'Release: Nightly stuck watchdog'

on:
  schedule:
    - cron: '0 12 * * *'  # noon UTC, daily
  workflow_dispatch:

permissions:
  actions: 'read'
  issues: 'write'

jobs:
  check-stuck:
    if: "github.repository == 'google-gemini/gemini-cli'"
    runs-on: 'ubuntu-latest'
    steps:
      - uses: 'actions/github-script@…'
        env:
          GITHUB_TOKEN: '${{ secrets.GITHUB_TOKEN }}'
        with:
          script: |
            const cutoff = Date.now() - 24*60*60*1000; // 24h ago
            const runs = await github.paginate(github.rest.actions.listWorkflowRuns, {
              owner: context.repo.owner,
              repo: context.repo.repo,
              workflow_id: 'release-nightly.yml',
              status: 'waiting',
              per_page: 50,
            });
            const stuck = runs.filter(r => new Date(r.created_at).getTime() < cutoff);
            if (stuck.length === 0) {
              core.info('No stuck nightlies.');
              return;
            }
            const body = [
              '### ⚠️ Release: Nightly is stuck waiting for approval',
              '',
              `${stuck.length} nightly run(s) have been waiting on \`prod\` environment approval for >24 h.`,
              '',
              'This often hides a real, recurring publish failure (see post-mortem of #25507).',
              '',
              '| Created | Run |',
              '|---|---|',
              ...stuck.slice(0, 10).map(r => `| ${r.created_at} | [#${r.id}](${r.html_url}) |`),
              '',
              'Action required: either approve the most recent run, or investigate why approvers are not approving.',
              '',
              '<!-- nightly-stuck-watchdog -->',
            ].join('\n');

            // Upsert a single tracking issue so we don't spam.
            const { data: existing } = await github.rest.search.issuesAndPullRequests({
              q: `repo:${context.repo.owner}/${context.repo.repo} is:issue is:open in:body "<!-- nightly-stuck-watchdog -->"`,
            });
            if (existing.items.length > 0) {
              await github.rest.issues.update({
                owner: context.repo.owner, repo: context.repo.repo,
                issue_number: existing.items[0].number,
                body,
              });
            } else {
              await github.rest.issues.create({
                owner: context.repo.owner, repo: context.repo.repo,
                title: 'Release: Nightly is stuck waiting for approval',
                body,
                labels: ['release-failure', 'priority/p1', '🔒 maintainer only'],
                assignees: [],  // optionally: maintainers
              });
            }

Optional escalations

  • After 48 h: also post to a Google Chat space for the release team.
  • After 72 h: bump priority/p1 → priority/p0.

Pair this with deduplication on auto-filed release-failure issues

The original #25507 was a one-shot file-and-close. If the next 7 nightly runs had updated the same issue with a recurrence count, the pattern would have been visible from a single notification. Suggested change to the existing Create Issue on Failure step in release-nightly.yml: search for an existing open release-failure issue with the same root-cause signature (e.g., grep last 100 lines of log for npm error code E\d+) and append a comment + bump priority instead of opening a fresh issue.

Estimated effort

~2–3 hours for the watchdog workflow. Recurrence-deduplication on the auto-filed failure issue is another ~2 hours.

Related Issues

  • Surfaced by the post-mortem of #25507
  • Companion of #25841 (this one prevents the next recurrence from going unnoticed)

Additional context

  • The prod environment has manual approval gating by gemini-cli-askmode-approvers — this is intentional for release safety, so removing the gate is not the right answer. The right answer is making "approval is overdue" visible.
  • The actions/listWorkflowRuns API supports filtering by status: 'waiting', so the implementation is straightforward.
  • The "single tracking issue" upsert pattern (rather than file-N-issues) is the same one used by agent-session-drift-check.yml and the gemini-automated-issue-dedup.yml workflows.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

gemini-cli - 💡(How to fix) Fix Notify when Release: Nightly is stuck waiting for approval [1 participants]