claude-code - 💡(How to fix) Fix [BUG] Claude Code unable to reproduce a canonical DOCX design across varied source docs — hours of iteration with no working result

Code Example

- Each minor source-pattern mismatch required a separate user prompt + Claude code change, often reverted minutes
  later, then re-applied. Over 100 micro-iterations on one document.
  - Claude repeatedly asked clarifying questions ("which fm_brand helper?", "bold or not bold?", "indent or table?")
  despite the canonical reference XML containing the exact answers.
  - Claude introduced custom python-docx rendering blocks (subheading, rule_label) outside the fm_brand helpers the user
   had spent days building — then removed and re-added them under conflicting instructions.
  - Visual checks were limited to first-page QuickLook thumbnails. Errors on later pages weren't caught until the user
  opened the doc in Word.
  - Same complaint loop repeated: user says "X is wrong" → Claude proposes Y → user says "no" → Claude removes Y → user
  says "now Y is missing" → etc.
  - No persistent learning: even with memory entries warning against making own decisions, Claude kept doing it.
  - Net result after 2+ hours: no working output. ChatGPT produced an acceptable rendering on first try via a different
  approach (direct HTML/CSS generation rather than docx classification).

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Goal: take a canonical reference DOCX and apply its visual design (forest header, burgundy eyebrows, FM brand fonts, tables) to other source DOCXs using fm_apply.py / fm_brand.py.
Outcome: the script handles brand header + title + tables + 1-row banner sections, but misses many source patterns (sub-instruction labels, sub-headings, Övning paragraph splits, image placement). Each new source reveals new gaps.
Time spent: 2+ hours on one doc with no acceptable output. ChatGPT produced an acceptable result in one try (PREPOSITIONER_styled.docx).
Process problem: Claude Code asked clarifying questions repeatedly instead of inferring from the verified canonical reference (THIS IS YOUR FUCKING CODE .docx).

What Should Happen?

When the user gives Claude a canonical FM-brand reference DOCX (THIS IS YOUR FUCKING CODE .docx) and a source DOCX, the output should visually match the canonical's design (forest title block + burgundy underline + small-caps eyebrows
- forest section headings + muted grey sub-instructions + burgundy-row tables + closing footer) — within one or two iterations, not 100+.
Claude should infer the mapping from the reference's XML (sizes, weights, colors) and apply it to the source automatically — not ask the user to confirm every helper-per-pattern.
The script (or Claude's per-task code) should detect common source patterns (paragraph-form Övning N: headings, sub-instruction labels, numbered question lists, image+caption pairs) without the user having to point each one out.
Final visual output should be comparable to what a one-shot LLM (e.g. ChatGPT generating styled HTML/DOCX directly from a description) produces, in roughly equivalent effort.

Error Messages/Logs

- Each minor source-pattern mismatch required a separate user prompt + Claude code change, often reverted minutes
  later, then re-applied. Over 100 micro-iterations on one document.
  - Claude repeatedly asked clarifying questions ("which fm_brand helper?", "bold or not bold?", "indent or table?")
  despite the canonical reference XML containing the exact answers.
  - Claude introduced custom python-docx rendering blocks (subheading, rule_label) outside the fm_brand helpers the user
   had spent days building — then removed and re-added them under conflicting instructions.
  - Visual checks were limited to first-page QuickLook thumbnails. Errors on later pages weren't caught until the user
  opened the doc in Word.
  - Same complaint loop repeated: user says "X is wrong" → Claude proposes Y → user says "no" → Claude removes Y → user
  says "now Y is missing" → etc.
  - No persistent learning: even with memory entries warning against making own decisions, Claude kept doing it.
  - Net result after 2+ hours: no working output. ChatGPT produced an acceptable rendering on first try via a different
  approach (direct HTML/CSS generation rather than docx classification).

Steps to Reproduce

Open Claude Code in the FLUENT MINDS project directory.
Ask Claude to "apply FM brand to BESTAMD_OBESTAMD_ADJ_BLANDAT_BUSINESS.docx" (or PREPOSITIONER- IMAGES.docx) using generators/fm_apply.py and the canonical reference THIS IS YOUR FUCKING CODE .docx.
Observe the output …-FM.docx.
Compare visually to the canonical reference.
Provide feedback on what doesn't match (Övning headings wrong color, sub-instructions wrong style, image cropping, etc.).
Repeat step 5 → Claude makes a change → reverts → re-adds → asks clarifying question → repeat.
After 2+ hours, no acceptable output.
Run the same source through ChatGPT with the same canonical reference → acceptable output on first try (PREPOSITIONER_styled.docx).

Claude Model

Opus

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

laude-opus-4-7 (Opus 4.7, 1M context)

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

Persistent memory entries existed at session start (e.g. never-make-own-decisions, always-fm-brand, fm-brand-means-rebuild, use-existing-template-first) but were not followed consistently — Claude made unilateral changes (removing/re-adding code blocks) without user approval, then acknowledged the rule violation only after being called out.
Project CLAUDE.md contains explicit rules ("Never improvise", "Never make own decisions", "Never guess or interpret") — same outcome: rules were saved to memory mid-session and still bypassed.
fm_brand.py (848 lines) was built over prior sessions to encode the FM brand programmatically; fm_apply.py (374 lines) wraps it. The wrapper is the failure point — its classifier doesn't reliably tag source paragraphs (bold-short, arrow-pattern, numbered-with-options, sub-instruction keywords) to the right fm_brand helper.
Alternative result (PREPOSITIONER_styled.docx) produced by ChatGPT is in 29.05/ for direct comparison.
Session duration: ~2 hours+ on a single doc with no acceptable output. Multiple separate hours-long sessions over previous days on the same fm_brand / fm_apply pipeline.
User reports the iteration pattern feels like "mental abuse" — sustained frustration over repeated, contradictory micro-edits is a documented usability problem worth surfacing.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Claude Code unable to reproduce a canonical DOCX design across varied source docs — hours of iteration with no working result

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Still need to ship something?

TRENDING