transformers - 💡(How to fix) Fix CLIPTextModel and CLIPTextModelWithProjection have inconsistent structure since v5.6 (.text_model removed from one but not the other)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

Code Example

from transformers import CLIPTextModel, CLIPTextModelWithProjection, CLIPTextConfig

cfg = CLIPTextConfig()

print("CLIPTextModel has .text_model:              ", hasattr(CLIPTextModel(cfg), "text_model"))
print("CLIPTextModelWithProjection has .text_model:", hasattr(CLIPTextModelWithProjection(cfg), "text_model"))

---

AttributeError: 'CLIPTextModel' object has no attribute 'text_model'
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: reproduced on 5.6.0 and 5.9.0 (unaffected on 5.5.4)
  • Platform: Windows-11-10.0.26200-SP0 (also reproducible cross-platform; the bug is in pure model construction)
  • Python version: 3.12.9
  • PyTorch version: 2.7.1+cu128
  • Accelerate version: 1.8.1

This is a pure-Python structural issue in model __init__; it does not depend on weights, device, or platform.

Who can help?

Leaving for triage — this concerns the CLIP text model implementation in models/clip/modeling_clip.py.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Since v5.6.0, CLIPTextModel was flattened — its submodules (embeddings, encoder, final_layer_norm) are now assigned directly in __init__ and the previous self.text_model (CLIPTextTransformer) wrapper was removed. However, the closely related CLIPTextModelWithProjection was not changed: its __init__ still does self.text_model = CLIPTextModel._from_config(config), so it retains a .text_model submodule.

The result is that two sibling classes now have inconsistent structure, and CLIPTextModel lost its long-standing .text_model attribute with no deprecation path.

from transformers import CLIPTextModel, CLIPTextModelWithProjection, CLIPTextConfig

cfg = CLIPTextConfig()

print("CLIPTextModel has .text_model:              ", hasattr(CLIPTextModel(cfg), "text_model"))
print("CLIPTextModelWithProjection has .text_model:", hasattr(CLIPTextModelWithProjection(cfg), "text_model"))

Output:

transformersCLIPTextModelCLIPTextModelWithProjection
≤ 5.5.xTrueTrue
≥ 5.6.0FalseTrue

Confirmed in source:

  • v5.5.xCLIPTextModel.__init__: self.text_model = CLIPTextTransformer(config)
  • v5.6.0+ — CLIPTextModel.__init__: self.embeddings = ...; self.encoder = ...; self.final_layer_norm = ... (no text_model)
  • v5.6.0+ — CLIPTextModelWithProjection.__init__: still self.text_model = CLIPTextModel._from_config(config)

Any code that reads model.text_model.<...> on a CLIPTextModel (a very common, long-stable pattern) now raises:

AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

while the identical access on CLIPTextModelWithProjection keeps working — so the regression appears for only some model variants and is easy to misattribute to downstream code.

Expected behavior

One of the following, rather than an undocumented, asymmetric removal:

  1. Consistency — flatten both classes the same way (or neither), so siblings share structure; or
  2. Backward-compatible accessor — keep a .text_model property on CLIPTextModel that returns the flattened module (so existing model.text_model.embeddings... access continues to resolve); or
  3. Documentation — if the flattening is intentional and permanent, call it out as a breaking change in the v5 migration guide, including the .text_model removal and the resulting asymmetry with CLIPTextModelWithProjection.

The core concern is that CLIPTextModel and CLIPTextModelWithProjection are now structurally inconsistent, and a widely-used attribute was removed silently.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

One of the following, rather than an undocumented, asymmetric removal:

  1. Consistency — flatten both classes the same way (or neither), so siblings share structure; or
  2. Backward-compatible accessor — keep a .text_model property on CLIPTextModel that returns the flattened module (so existing model.text_model.embeddings... access continues to resolve); or
  3. Documentation — if the flattening is intentional and permanent, call it out as a breaking change in the v5 migration guide, including the .text_model removal and the resulting asymmetry with CLIPTextModelWithProjection.

The core concern is that CLIPTextModel and CLIPTextModelWithProjection are now structurally inconsistent, and a widely-used attribute was removed silently.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING