One of the following, rather than an undocumented, asymmetric removal: 1. **Consistency** — flatten both classes the same way (or neither), so siblings share structure; or 2. **Backward-compatible accessor** — keep a `.text_model` property on `CLIPTextModel` that returns the flattened module (so existing `model.text_model.embeddings...` access continues to resolve); or 3. **Documentation** — if the flattening is intentional and permanent, call it out as a breaking change in the v5 migration guide, including the `.text_model` removal and the resulting asymmetry with `CLIPTextModelWithProjection`. The core concern is that `CLIPTextModel` and `CLIPTextModelWithProjection` are now structurally inconsistent, and a widely-used attribute was removed silently.

transformers - 💡(How to fix) Fix CLIPTextModel and CLIPTextModelWithProjection have inconsistent structure since v5.6 (.text_model removed from one but not the other)

Code Example

from transformers import CLIPTextModel, CLIPTextModelWithProjection, CLIPTextConfig

cfg = CLIPTextConfig()

print("CLIPTextModel has .text_model:              ", hasattr(CLIPTextModel(cfg), "text_model"))
print("CLIPTextModelWithProjection has .text_model:", hasattr(CLIPTextModelWithProjection(cfg), "text_model"))

---

AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

System Info

transformers version: reproduced on 5.6.0 and 5.9.0 (unaffected on 5.5.4)
Platform: Windows-11-10.0.26200-SP0 (also reproducible cross-platform; the bug is in pure model construction)
Python version: 3.12.9
PyTorch version: 2.7.1+cu128
Accelerate version: 1.8.1

This is a pure-Python structural issue in model __init__; it does not depend on weights, device, or platform.

Who can help?

Leaving for triage — this concerns the CLIP text model implementation in models/clip/modeling_clip.py.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Since v5.6.0, CLIPTextModel was flattened — its submodules (embeddings, encoder, final_layer_norm) are now assigned directly in __init__ and the previous self.text_model (CLIPTextTransformer) wrapper was removed. However, the closely related CLIPTextModelWithProjection was not changed: its __init__ still does self.text_model = CLIPTextModel._from_config(config), so it retains a .text_model submodule.

The result is that two sibling classes now have inconsistent structure, and CLIPTextModel lost its long-standing .text_model attribute with no deprecation path.

from transformers import CLIPTextModel, CLIPTextModelWithProjection, CLIPTextConfig

cfg = CLIPTextConfig()

print("CLIPTextModel has .text_model:              ", hasattr(CLIPTextModel(cfg), "text_model"))
print("CLIPTextModelWithProjection has .text_model:", hasattr(CLIPTextModelWithProjection(cfg), "text_model"))

Output:

transformers	`CLIPTextModel`	`CLIPTextModelWithProjection`
≤ 5.5.x	`True`	`True`
≥ 5.6.0	`False`	`True`

Confirmed in source:

v5.5.x — CLIPTextModel.__init__: self.text_model = CLIPTextTransformer(config)
v5.6.0+ — CLIPTextModel.__init__: self.embeddings = ...; self.encoder = ...; self.final_layer_norm = ... (no text_model)
v5.6.0+ — CLIPTextModelWithProjection.__init__: still self.text_model = CLIPTextModel._from_config(config)

Any code that reads model.text_model.<...> on a CLIPTextModel (a very common, long-stable pattern) now raises:

AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

while the identical access on CLIPTextModelWithProjection keeps working — so the regression appears for only some model variants and is easy to misattribute to downstream code.

Expected behavior

One of the following, rather than an undocumented, asymmetric removal:

Consistency — flatten both classes the same way (or neither), so siblings share structure; or
Backward-compatible accessor — keep a .text_model property on CLIPTextModel that returns the flattened module (so existing model.text_model.embeddings... access continues to resolve); or
Documentation — if the flattening is intentional and permanent, call it out as a breaking change in the v5 migration guide, including the .text_model removal and the resulting asymmetry with CLIPTextModelWithProjection.

The core concern is that CLIPTextModel and CLIPTextModelWithProjection are now structurally inconsistent, and a widely-used attribute was removed silently.

FAQ

Expected behavior

One of the following, rather than an undocumented, asymmetric removal:

Consistency — flatten both classes the same way (or neither), so siblings share structure; or
Backward-compatible accessor — keep a .text_model property on CLIPTextModel that returns the flattened module (so existing model.text_model.embeddings... access continues to resolve); or
Documentation — if the flattening is intentional and permanent, call it out as a breaking change in the v5 migration guide, including the .text_model removal and the resulting asymmetry with CLIPTextModelWithProjection.

The core concern is that CLIPTextModel and CLIPTextModelWithProjection are now structurally inconsistent, and a widely-used attribute was removed silently.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix CLIPTextModel and CLIPTextModelWithProjection have inconsistent structure since v5.6 (.text_model removed from one but not the other)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

FAQ

Expected behavior

Still need to ship something?

TRENDING