claude-code - 💡(How to fix) Fix [BUG] Claude Code unable to reproduce a canonical DOCX design across varied source docs — hours of iteration with no working result

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error Messages/Logs

Code Example

- Each minor source-pattern mismatch required a separate user prompt + Claude code change, often reverted minutes
  later, then re-applied. Over 100 micro-iterations on one document.
  - Claude repeatedly asked clarifying questions ("which fm_brand helper?", "bold or not bold?", "indent or table?")
  despite the canonical reference XML containing the exact answers.
  - Claude introduced custom python-docx rendering blocks (subheading, rule_label) outside the fm_brand helpers the user
   had spent days building — then removed and re-added them under conflicting instructions.
  - Visual checks were limited to first-page QuickLook thumbnails. Errors on later pages weren't caught until the user
  opened the doc in Word.
  - Same complaint loop repeated: user says "X is wrong"Claude proposes Y → user says "no"Claude removes Y → user
  says "now Y is missing" → etc.
  - No persistent learning: even with memory entries warning against making own decisions, Claude kept doing it.
  - Net result after 2+ hours: no working output. ChatGPT produced an acceptable rendering on first try via a different
  approach (direct HTML/CSS generation rather than docx classification).
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

  • Goal: take a canonical reference DOCX and apply its visual design (forest header, burgundy eyebrows, FM brand fonts, tables) to other source DOCXs using fm_apply.py / fm_brand.py.
  • Outcome: the script handles brand header + title + tables + 1-row banner sections, but misses many source patterns (sub-instruction labels, sub-headings, Övning paragraph splits, image placement). Each new source reveals new gaps.
  • Time spent: 2+ hours on one doc with no acceptable output. ChatGPT produced an acceptable result in one try (PREPOSITIONER_styled.docx).
  • Process problem: Claude Code asked clarifying questions repeatedly instead of inferring from the verified canonical reference (THIS IS YOUR FUCKING CODE .docx).

What Should Happen?

  • When the user gives Claude a canonical FM-brand reference DOCX (THIS IS YOUR FUCKING CODE .docx) and a source DOCX, the output should visually match the canonical's design (forest title block + burgundy underline + small-caps eyebrows
    • forest section headings + muted grey sub-instructions + burgundy-row tables + closing footer) — within one or two iterations, not 100+.
  • Claude should infer the mapping from the reference's XML (sizes, weights, colors) and apply it to the source automatically — not ask the user to confirm every helper-per-pattern.
  • The script (or Claude's per-task code) should detect common source patterns (paragraph-form Övning N: headings, sub-instruction labels, numbered question lists, image+caption pairs) without the user having to point each one out.
  • Final visual output should be comparable to what a one-shot LLM (e.g. ChatGPT generating styled HTML/DOCX directly from a description) produces, in roughly equivalent effort.

Error Messages/Logs

- Each minor source-pattern mismatch required a separate user prompt + Claude code change, often reverted minutes
  later, then re-applied. Over 100 micro-iterations on one document.
  - Claude repeatedly asked clarifying questions ("which fm_brand helper?", "bold or not bold?", "indent or table?")
  despite the canonical reference XML containing the exact answers.
  - Claude introduced custom python-docx rendering blocks (subheading, rule_label) outside the fm_brand helpers the user
   had spent days building — then removed and re-added them under conflicting instructions.
  - Visual checks were limited to first-page QuickLook thumbnails. Errors on later pages weren't caught until the user
  opened the doc in Word.
  - Same complaint loop repeated: user says "X is wrong" → Claude proposes Y → user says "no" → Claude removes Y → user
  says "now Y is missing" → etc.
  - No persistent learning: even with memory entries warning against making own decisions, Claude kept doing it.
  - Net result after 2+ hours: no working output. ChatGPT produced an acceptable rendering on first try via a different
  approach (direct HTML/CSS generation rather than docx classification).

Steps to Reproduce

  1. Open Claude Code in the FLUENT MINDS project directory.
  2. Ask Claude to "apply FM brand to BESTAMD_OBESTAMD_ADJ_BLANDAT_BUSINESS.docx" (or PREPOSITIONER- IMAGES.docx) using generators/fm_apply.py and the canonical reference THIS IS YOUR FUCKING CODE .docx.
  3. Observe the output …-FM.docx.
  4. Compare visually to the canonical reference.
  5. Provide feedback on what doesn't match (Övning headings wrong color, sub-instructions wrong style, image cropping, etc.).
  6. Repeat step 5 → Claude makes a change → reverts → re-adds → asks clarifying question → repeat.
  7. After 2+ hours, no acceptable output.
  8. Run the same source through ChatGPT with the same canonical reference → acceptable output on first try (PREPOSITIONER_styled.docx).

Claude Model

Opus

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

laude-opus-4-7 (Opus 4.7, 1M context)

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

  • Persistent memory entries existed at session start (e.g. never-make-own-decisions, always-fm-brand, fm-brand-means-rebuild, use-existing-template-first) but were not followed consistently — Claude made unilateral changes (removing/re-adding code blocks) without user approval, then acknowledged the rule violation only after being called out.
  • Project CLAUDE.md contains explicit rules ("Never improvise", "Never make own decisions", "Never guess or interpret") — same outcome: rules were saved to memory mid-session and still bypassed.
  • fm_brand.py (848 lines) was built over prior sessions to encode the FM brand programmatically; fm_apply.py (374 lines) wraps it. The wrapper is the failure point — its classifier doesn't reliably tag source paragraphs (bold-short, arrow-pattern, numbered-with-options, sub-instruction keywords) to the right fm_brand helper.
  • Alternative result (PREPOSITIONER_styled.docx) produced by ChatGPT is in 29.05/ for direct comparison.
  • Session duration: ~2 hours+ on a single doc with no acceptable output. Multiple separate hours-long sessions over previous days on the same fm_brand / fm_apply pipeline.
  • User reports the iteration pattern feels like "mental abuse" — sustained frustration over repeated, contradictory micro-edits is a documented usability problem worth surfacing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Claude Code unable to reproduce a canonical DOCX design across varied source docs — hours of iteration with no working result