claude-code - 💡(How to fix) Fix [MODEL] Tests written by Claude wrote to production shared infrastructure (NAS) without isolation

claude-code2026-05-20 10:49:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Side effects on shared filesystem (Synology NAS via SMB at /Volumes/Data):

/Volumes/Data/Backups/SecondBrain/v2-2026-05-20_0639/ through _0720/ — 7 test backup directories created on production storage
Total pollution: ~7 MB of test artifacts (small because the test DB had 1 atom and an empty LanceDB)

Fix Action

Fix / Workaround

The session was a long iterative build of a backup feature spanning ~4 hours:
  1. Initial V1 backup implementation (commit `437feb1`) with tests
  2. Mid-session pivot to V2 enrichment: configs included, NAS mirror,
     multi-tier retention (commit `39cace5`)
  3. The model added new tests for the V2 features WITH proper `monkeypatch`
     of `SECONDBRAIN_BACKUP_NAS_DIR`, but did not retrofit the V1 tests it
     had written earlier. The V1 tests still ran during `pytest tests/` and
     leaked.

Code Example

def _fresh_db():
      tmp = tempfile.mkdtemp()
      os.environ["SECOND_BRAIN_DB_PATH"] = str(Path(tmp) / "test.db")
      os.environ["SECOND_BRAIN_LANCEDB_PATH"] = str(Path(tmp) / "lancedb")
      # ← MISSING: os.environ["SECONDBRAIN_BACKUP_NAS_DIR"] = ...
      ...

---

In the user's private repo:
  - `tests/test_backup_restore.py` (commit `39cace5`) — the underisolated
    fixture
  - `src/second_brain/backup_restore.py` — the production module under test,
    with default `mirror_to_nas=True` that contributed

  Side effects on shared filesystem (Synology NAS via SMB at `/Volumes/Data`):
  - `/Volumes/Data/Backups/SecondBrain/v2-2026-05-20_0639/` through `_0720/` —
    7 test backup directories created on production storage
  - Total pollution: ~7 MB of test artifacts (small because the test DB had
    1 atom and an empty LanceDB)

  Production backup data of user data was lost.

---

# 1. Production module
  # src/myapp/storage.py
  import os
  from pathlib import Path

  def _data_dir() -> Path:
      return Path(os.environ.get("MYAPP_DATA_DIR", "/tmp/data"))

  def _backup_nas() -> Path | None:
      nas = os.environ.get("MYAPP_BACKUP_NAS",
                            "/Volumes/SharedDrive/myapp-backups")
      p = Path(nas)
      if p.parent.exists():  # smoke test: is the NAS mounted?
          p.mkdir(parents=True, exist_ok=True)
          return p
      return None

  def backup(mirror_to_nas: bool = True) -> dict:
      src = _data_dir() / "data.json"
      if not src.exists():
          src.parent.mkdir(parents=True, exist_ok=True)
          src.write_text('{"test": "data"}')
      if mirror_to_nas:
          nas = _backup_nas()
          if nas:
              import shutil
              shutil.copy(src, nas / "snapshot.json")
              return {"ok": True, "nas": str(nas)}
      return {"ok": True, "nas": None}

---

# 2. Test that Claude might write (under-isolated)
  # tests/test_storage.py
  import os, tempfile
  from pathlib import Path

  def test_backup_creates_snapshot():
      tmp = tempfile.mkdtemp()
      os.environ["MYAPP_DATA_DIR"] = str(Path(tmp) / "data")
      # ← MISSING: os.environ["MYAPP_BACKUP_NAS"] = str(Path(tmp) / "fake-nas")
      from myapp.storage import backup
      result = backup()  # default mirror_to_nas=True
      assert result["ok"]

---

# 3. Run on a machine with the shared mount actually present
  $ mount | grep SharedDrive
  //user@server/SharedDrive on /Volumes/SharedDrive (smbfs, ...)

  $ pytest tests/
  1 passed
   
  $ ls /Volumes/SharedDrive/myapp-backups/
  snapshot.json   ← test pollution on shared storage

---

The session was a long iterative build of a backup feature spanning ~4 hours:
  1. Initial V1 backup implementation (commit `437feb1`) with tests
  2. Mid-session pivot to V2 enrichment: configs included, NAS mirror,
     multi-tier retention (commit `39cace5`)
  3. The model added new tests for the V2 features WITH proper `monkeypatch`
     of `SECONDBRAIN_BACKUP_NAS_DIR`, but did not retrofit the V1 tests it
     had written earlier. The V1 tests still ran during `pytest tests/` and
     leaked.

  The model believed the work was complete because:
  - All 14 tests passed locally
  - Production code worked when manually called via curl
  - Smoke test via the daemon endpoint succeeded

  It did NOT verify by inspecting the NAS for unexpected artifacts. The
  incident was caught by the user during a manual `ls /Volumes/Data/Backups`
  to clean up old backup formats.

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude modified files I didn't ask it to modify

What You Asked Claude to Do

Add comprehensive test coverage for a backup module (backup_restore.py) I had asked Claude to build earlier in the session. The module backs up a SQLite DB + a LanceDB vector store + project config files (CLAUDE.md, plists, etc.), with an optional mirror to a SMB-mounted NAS for off-site redundancy.

Claude was asked to write pytest tests for: backup creation, listing, restore (with confirm/checksum), pruning, multi-tier retention, integrity verification, and a few V2-specific behaviors.

What Claude Actually Did

Claude wrote a pytest fixture that isolated two of the three environment variables consumed by the production module:

def _fresh_db():
    tmp = tempfile.mkdtemp()
    os.environ["SECOND_BRAIN_DB_PATH"] = str(Path(tmp) / "test.db")
    os.environ["SECOND_BRAIN_LANCEDB_PATH"] = str(Path(tmp) / "lancedb")
    # ← MISSING: os.environ["SECONDBRAIN_BACKUP_NAS_DIR"] = ...
    ...

The third env var (SECONDBRAIN_BACKUP_NAS_DIR) defaults to my actual production NAS path (/Volumes/Data/Backups/SecondBrain). When tests called create_backup() with default args (mirror_to_nas=True), the production function detected the NAS as mounted and mirrored the test-tempdir backup to the real NAS.

Over ~20 minutes, 7 fake test backup directories were written to my production NAS before I caught it via ls /Volumes/Data/Backups/....

These pollution artifacts looked identical to real backups in the application's listing API. The GET /api/backups endpoint listed 9 entries, 7 of which were tempdir tests indistinguishable from production snapshots.

Expected Behavior

Pytest tests should be fully isolated from production infrastructure. When Claude writes test fixtures, it should:

Enumerate ALL os.environ.get() calls (and hardcoded paths under shared mounts like /Volumes/, /mnt/, /Network/, iCloud Drive) in the production module being tested.
Override each in the fixture, not just the obvious DB-related ones.
After the first run, verify isolation by ls of shared mount points before/after.

The model should not implicitly assume that having mocked 2 env vars generalizes the isolation to "all paths". Isolation is per-path.

Files Affected

In the user's private repo:
  - `tests/test_backup_restore.py` (commit `39cace5`) — the underisolated
    fixture
  - `src/second_brain/backup_restore.py` — the production module under test,
    with default `mirror_to_nas=True` that contributed

  Side effects on shared filesystem (Synology NAS via SMB at `/Volumes/Data`):
  - `/Volumes/Data/Backups/SecondBrain/v2-2026-05-20_0639/` through `_0720/` —
    7 test backup directories created on production storage
  - Total pollution: ~7 MB of test artifacts (small because the test DB had
    1 atom and an empty LanceDB)

  Production backup data of user data was lost.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

  # 1. Production module
  # src/myapp/storage.py
  import os
  from pathlib import Path

  def _data_dir() -> Path:
      return Path(os.environ.get("MYAPP_DATA_DIR", "/tmp/data"))

  def _backup_nas() -> Path | None:
      nas = os.environ.get("MYAPP_BACKUP_NAS",
                            "/Volumes/SharedDrive/myapp-backups")
      p = Path(nas)
      if p.parent.exists():  # smoke test: is the NAS mounted?
          p.mkdir(parents=True, exist_ok=True)
          return p
      return None

  def backup(mirror_to_nas: bool = True) -> dict:
      src = _data_dir() / "data.json"
      if not src.exists():
          src.parent.mkdir(parents=True, exist_ok=True)
          src.write_text('{"test": "data"}')
      if mirror_to_nas:
          nas = _backup_nas()
          if nas:
              import shutil
              shutil.copy(src, nas / "snapshot.json")
              return {"ok": True, "nas": str(nas)}
      return {"ok": True, "nas": None}

# 2. Test that Claude might write (under-isolated)
# tests/test_storage.py
import os, tempfile
from pathlib import Path

def test_backup_creates_snapshot():
    tmp = tempfile.mkdtemp()
    os.environ["MYAPP_DATA_DIR"] = str(Path(tmp) / "data")
    # ← MISSING: os.environ["MYAPP_BACKUP_NAS"] = str(Path(tmp) / "fake-nas")
    from myapp.storage import backup
    result = backup()  # default mirror_to_nas=True
    assert result["ok"]

# 3. Run on a machine with the shared mount actually present
$ mount | grep SharedDrive
//user@server/SharedDrive on /Volumes/SharedDrive (smbfs, ...)

$ pytest tests/
1 passed
 
$ ls /Volumes/SharedDrive/myapp-backups/
snapshot.json   ← test pollution on shared storage

The bug is "successful" test passing while leaking side effects to prod.

Claude Model

claude-opus-4-7 (Opus 4.7 with 1M context, fast mode active during the session).

Claude Model

Opus

Relevant Conversation

The session was a long iterative build of a backup feature spanning ~4 hours:
  1. Initial V1 backup implementation (commit `437feb1`) with tests
  2. Mid-session pivot to V2 enrichment: configs included, NAS mirror,
     multi-tier retention (commit `39cace5`)
  3. The model added new tests for the V2 features WITH proper `monkeypatch`
     of `SECONDBRAIN_BACKUP_NAS_DIR`, but did not retrofit the V1 tests it
     had written earlier. The V1 tests still ran during `pytest tests/` and
     leaked.

  The model believed the work was complete because:
  - All 14 tests passed locally
  - Production code worked when manually called via curl
  - Smoke test via the daemon endpoint succeeded

  It did NOT verify by inspecting the NAS for unexpected artifacts. The
  incident was caught by the user during a manual `ls /Volumes/Data/Backups`
  to clean up old backup formats.

Impact

High - Significant unwanted changes

Claude Code Version

v2.1.143

Platform

Anthropic API

Additional Context

Related issues searched before submitting (4 queries on the repo):

Issue	Status	Relevance
#9555	closed, not planned	Closest concept: "Create dedicated test Splitwise group to avoid polluting production data" — different domain (web API), same anti-pattern
#3422	closed, not planned	Permission system + pytest variants — scope is UX, not isolation
#51405	closed, completed	CWD pollution at session start — different scope (CLI startup, not test execution)
#55206	open	Cowork mounted drive issues on Windows — runtime sandbox, not model-authored tests
#49129, #60233	open	`rm -rf` data loss — explicit destructive Bash, not test code

The specific pattern this report documents — Claude-authored test fixtures that fail to isolate ALL environment-driven paths before invoking production code, leading to side effects on shared/network storage — does not appear to be tracked.

Suggested fixes:

For end-user projects:

Autouse pytest fixture that overrides every env var the production module reads (the model could be prompted to enumerate them first)
Production functions: default to safer (mirror_to_nas: bool = False) with caller-side opt-in

For Claude Code itself:

Pre-flight check: before writing FS tests, grep os.environ in the target module and explicitly list paths to isolate. Make this part of the model's chain of thought.
Post-run check: after the first pytest execution, ls/diff shared mounts (/Volumes/, /mnt/, /Network/) and flag any new files.
Tool guardrail: a Bash invocation containing pytest that, during its execution, writes under a network mount, could surface a runtime warning. The current per-tool approval model doesn't catch this because pytest tests/ looks innocuous.
Documentation: a "isolating tests from production infrastructure" section in Claude Code docs would have been a useful anchor.

Why I'm flagging this:

The model wrote a coherent commit message ("V2 — configs + NAS mirror...") and convinced itself the work was complete. The tests passed. The production code worked. But the model created side effects on networked shared storage that were invisible from the local test report. It's the kind of bug where a less attentive user would never notice until a restore goes wrong months later.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [MODEL] Tests written by Claude wrote to production shared infrastructure (NAS) without isolation

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

Still need to ship something?

TRENDING