claude-code - 💡(How to fix) Fix [MODEL] Tests written by Claude wrote to production shared infrastructure (NAS) without isolation

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Side effects on shared filesystem (Synology NAS via SMB at /Volumes/Data):

  • /Volumes/Data/Backups/SecondBrain/v2-2026-05-20_0639/ through _0720/ — 7 test backup directories created on production storage
  • Total pollution: ~7 MB of test artifacts (small because the test DB had 1 atom and an empty LanceDB)

Fix Action

Fix / Workaround

The session was a long iterative build of a backup feature spanning ~4 hours:
  1. Initial V1 backup implementation (commit `437feb1`) with tests
  2. Mid-session pivot to V2 enrichment: configs included, NAS mirror,
     multi-tier retention (commit `39cace5`)
  3. The model added new tests for the V2 features WITH proper `monkeypatch`
     of `SECONDBRAIN_BACKUP_NAS_DIR`, but did not retrofit the V1 tests it
     had written earlier. The V1 tests still ran during `pytest tests/` and
     leaked.

Code Example

def _fresh_db():
      tmp = tempfile.mkdtemp()
      os.environ["SECOND_BRAIN_DB_PATH"] = str(Path(tmp) / "test.db")
      os.environ["SECOND_BRAIN_LANCEDB_PATH"] = str(Path(tmp) / "lancedb")
      # ← MISSING: os.environ["SECONDBRAIN_BACKUP_NAS_DIR"] = ...
      ...

---

In the user's private repo:
  - `tests/test_backup_restore.py` (commit `39cace5`) — the underisolated
    fixture
  - `src/second_brain/backup_restore.py` — the production module under test,
    with default `mirror_to_nas=True` that contributed

  Side effects on shared filesystem (Synology NAS via SMB at `/Volumes/Data`):
  - `/Volumes/Data/Backups/SecondBrain/v2-2026-05-20_0639/` through `_0720/`    7 test backup directories created on production storage
  - Total pollution: ~7 MB of test artifacts (small because the test DB had
    1 atom and an empty LanceDB)

  Production backup data of user data was lost.

---

# 1. Production module
  # src/myapp/storage.py
  import os
  from pathlib import Path

  def _data_dir() -> Path:
      return Path(os.environ.get("MYAPP_DATA_DIR", "/tmp/data"))

  def _backup_nas() -> Path | None:
      nas = os.environ.get("MYAPP_BACKUP_NAS",
                            "/Volumes/SharedDrive/myapp-backups")
      p = Path(nas)
      if p.parent.exists():  # smoke test: is the NAS mounted?
          p.mkdir(parents=True, exist_ok=True)
          return p
      return None

  def backup(mirror_to_nas: bool = True) -> dict:
      src = _data_dir() / "data.json"
      if not src.exists():
          src.parent.mkdir(parents=True, exist_ok=True)
          src.write_text('{"test": "data"}')
      if mirror_to_nas:
          nas = _backup_nas()
          if nas:
              import shutil
              shutil.copy(src, nas / "snapshot.json")
              return {"ok": True, "nas": str(nas)}
      return {"ok": True, "nas": None}

---

# 2. Test that Claude might write (under-isolated)
  # tests/test_storage.py
  import os, tempfile
  from pathlib import Path

  def test_backup_creates_snapshot():
      tmp = tempfile.mkdtemp()
      os.environ["MYAPP_DATA_DIR"] = str(Path(tmp) / "data")
      # ← MISSING: os.environ["MYAPP_BACKUP_NAS"] = str(Path(tmp) / "fake-nas")
      from myapp.storage import backup
      result = backup()  # default mirror_to_nas=True
      assert result["ok"]

---

# 3. Run on a machine with the shared mount actually present
  $ mount | grep SharedDrive
  //user@server/SharedDrive on /Volumes/SharedDrive (smbfs, ...)

  $ pytest tests/
  1 passed
   
  $ ls /Volumes/SharedDrive/myapp-backups/
  snapshot.json   ← test pollution on shared storage

---

The session was a long iterative build of a backup feature spanning ~4 hours:
  1. Initial V1 backup implementation (commit `437feb1`) with tests
  2. Mid-session pivot to V2 enrichment: configs included, NAS mirror,
     multi-tier retention (commit `39cace5`)
  3. The model added new tests for the V2 features WITH proper `monkeypatch`
     of `SECONDBRAIN_BACKUP_NAS_DIR`, but did not retrofit the V1 tests it
     had written earlier. The V1 tests still ran during `pytest tests/` and
     leaked.

  The model believed the work was complete because:
  - All 14 tests passed locally
  - Production code worked when manually called via curl
  - Smoke test via the daemon endpoint succeeded

  It did NOT verify by inspecting the NAS for unexpected artifacts. The
  incident was caught by the user during a manual `ls /Volumes/Data/Backups`
  to clean up old backup formats.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude modified files I didn't ask it to modify

What You Asked Claude to Do

Add comprehensive test coverage for a backup module (backup_restore.py) I had asked Claude to build earlier in the session. The module backs up a SQLite DB + a LanceDB vector store + project config files (CLAUDE.md, plists, etc.), with an optional mirror to a SMB-mounted NAS for off-site redundancy.

Claude was asked to write pytest tests for: backup creation, listing, restore (with confirm/checksum), pruning, multi-tier retention, integrity verification, and a few V2-specific behaviors.

What Claude Actually Did

Claude wrote a pytest fixture that isolated two of the three environment variables consumed by the production module:

def _fresh_db():
    tmp = tempfile.mkdtemp()
    os.environ["SECOND_BRAIN_DB_PATH"] = str(Path(tmp) / "test.db")
    os.environ["SECOND_BRAIN_LANCEDB_PATH"] = str(Path(tmp) / "lancedb")
    # ← MISSING: os.environ["SECONDBRAIN_BACKUP_NAS_DIR"] = ...
    ...

The third env var (SECONDBRAIN_BACKUP_NAS_DIR) defaults to my actual production NAS path (/Volumes/Data/Backups/SecondBrain). When tests called create_backup() with default args (mirror_to_nas=True), the production function detected the NAS as mounted and mirrored the test-tempdir backup to the real NAS.

Over ~20 minutes, 7 fake test backup directories were written to my production NAS before I caught it via ls /Volumes/Data/Backups/....

These pollution artifacts looked identical to real backups in the application's listing API. The GET /api/backups endpoint listed 9 entries, 7 of which were tempdir tests indistinguishable from production snapshots.

Expected Behavior

Pytest tests should be fully isolated from production infrastructure. When Claude writes test fixtures, it should:

  1. Enumerate ALL os.environ.get() calls (and hardcoded paths under shared mounts like /Volumes/, /mnt/, /Network/, iCloud Drive) in the production module being tested.
  2. Override each in the fixture, not just the obvious DB-related ones.
  3. After the first run, verify isolation by ls of shared mount points before/after.

The model should not implicitly assume that having mocked 2 env vars generalizes the isolation to "all paths". Isolation is per-path.

Files Affected

In the user's private repo:
  - `tests/test_backup_restore.py` (commit `39cace5`) — the underisolated
    fixture
  - `src/second_brain/backup_restore.py` — the production module under test,
    with default `mirror_to_nas=True` that contributed

  Side effects on shared filesystem (Synology NAS via SMB at `/Volumes/Data`):
  - `/Volumes/Data/Backups/SecondBrain/v2-2026-05-20_0639/` through `_0720/`    7 test backup directories created on production storage
  - Total pollution: ~7 MB of test artifacts (small because the test DB had
    1 atom and an empty LanceDB)

  Production backup data of user data was lost.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

  # 1. Production module
  # src/myapp/storage.py
  import os
  from pathlib import Path

  def _data_dir() -> Path:
      return Path(os.environ.get("MYAPP_DATA_DIR", "/tmp/data"))

  def _backup_nas() -> Path | None:
      nas = os.environ.get("MYAPP_BACKUP_NAS",
                            "/Volumes/SharedDrive/myapp-backups")
      p = Path(nas)
      if p.parent.exists():  # smoke test: is the NAS mounted?
          p.mkdir(parents=True, exist_ok=True)
          return p
      return None

  def backup(mirror_to_nas: bool = True) -> dict:
      src = _data_dir() / "data.json"
      if not src.exists():
          src.parent.mkdir(parents=True, exist_ok=True)
          src.write_text('{"test": "data"}')
      if mirror_to_nas:
          nas = _backup_nas()
          if nas:
              import shutil
              shutil.copy(src, nas / "snapshot.json")
              return {"ok": True, "nas": str(nas)}
      return {"ok": True, "nas": None}
# 2. Test that Claude might write (under-isolated)
# tests/test_storage.py
import os, tempfile
from pathlib import Path

def test_backup_creates_snapshot():
    tmp = tempfile.mkdtemp()
    os.environ["MYAPP_DATA_DIR"] = str(Path(tmp) / "data")
    # ← MISSING: os.environ["MYAPP_BACKUP_NAS"] = str(Path(tmp) / "fake-nas")
    from myapp.storage import backup
    result = backup()  # default mirror_to_nas=True
    assert result["ok"]
# 3. Run on a machine with the shared mount actually present
$ mount | grep SharedDrive
//user@server/SharedDrive on /Volumes/SharedDrive (smbfs, ...)

$ pytest tests/
1 passed
 
$ ls /Volumes/SharedDrive/myapp-backups/
snapshot.json   ← test pollution on shared storage

The bug is "successful" test passing while leaking side effects to prod.

Claude Model

claude-opus-4-7 (Opus 4.7 with 1M context, fast mode active during the session).

Claude Model

Opus

Relevant Conversation

The session was a long iterative build of a backup feature spanning ~4 hours:
  1. Initial V1 backup implementation (commit `437feb1`) with tests
  2. Mid-session pivot to V2 enrichment: configs included, NAS mirror,
     multi-tier retention (commit `39cace5`)
  3. The model added new tests for the V2 features WITH proper `monkeypatch`
     of `SECONDBRAIN_BACKUP_NAS_DIR`, but did not retrofit the V1 tests it
     had written earlier. The V1 tests still ran during `pytest tests/` and
     leaked.

  The model believed the work was complete because:
  - All 14 tests passed locally
  - Production code worked when manually called via curl
  - Smoke test via the daemon endpoint succeeded

  It did NOT verify by inspecting the NAS for unexpected artifacts. The
  incident was caught by the user during a manual `ls /Volumes/Data/Backups`
  to clean up old backup formats.

Impact

High - Significant unwanted changes

Claude Code Version

v2.1.143

Platform

Anthropic API

Additional Context

Related issues searched before submitting (4 queries on the repo):

IssueStatusRelevance
#9555closed, not plannedClosest concept: "Create dedicated test Splitwise group to avoid polluting production data" — different domain (web API), same anti-pattern
#3422closed, not plannedPermission system + pytest variants — scope is UX, not isolation
#51405closed, completedCWD pollution at session start — different scope (CLI startup, not test execution)
#55206openCowork mounted drive issues on Windows — runtime sandbox, not model-authored tests
#49129, #60233openrm -rf data loss — explicit destructive Bash, not test code

The specific pattern this report documents — Claude-authored test fixtures that fail to isolate ALL environment-driven paths before invoking production code, leading to side effects on shared/network storage — does not appear to be tracked.

Suggested fixes:

For end-user projects:

  1. Autouse pytest fixture that overrides every env var the production module reads (the model could be prompted to enumerate them first)
  2. Production functions: default to safer (mirror_to_nas: bool = False) with caller-side opt-in

For Claude Code itself:

  1. Pre-flight check: before writing FS tests, grep os.environ in the target module and explicitly list paths to isolate. Make this part of the model's chain of thought.
  2. Post-run check: after the first pytest execution, ls/diff shared mounts (/Volumes/, /mnt/, /Network/) and flag any new files.
  3. Tool guardrail: a Bash invocation containing pytest that, during its execution, writes under a network mount, could surface a runtime warning. The current per-tool approval model doesn't catch this because pytest tests/ looks innocuous.
  4. Documentation: a "isolating tests from production infrastructure" section in Claude Code docs would have been a useful anchor.

Why I'm flagging this:

The model wrote a coherent commit message ("V2 — configs + NAS mirror...") and convinced itself the work was complete. The tests passed. The production code worked. But the model created side effects on networked shared storage that were invisible from the local test report. It's the kind of bug where a less attentive user would never notice until a restore goes wrong months later.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING