litellm - ✅(Solved) Fix [Compat Matrix] Slice 2: Add remaining 4 provider columns for basic_messaging_non_streaming [1 pull requests, 2 comments, 1 participants]

mateo-berri · 2026-04-25T03:35:48Z

[litellm] PR 26491: WIP feat tests : Claude Code Compatibility Matrix v0 PRD 26476 - Repository: BerriAI/litellm - Author: mateo-berri - State: open | merged:… # PR #26491: [WIP] feat(tests): Claude Code Compatibility Matrix v0 (PRD #26476) - Repository: BerriAI/litellm - Author: mateo-berri - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/26491 ## Description (problem / solution / changelog) ## Relevant issues Implements the v0 of the Claude Code Compatibility Matrix. - Parent PRD: #26476 - Slice 1 (tracer bullet): #26477 - Slice 2 (4 provider columns for `basic_messaging_non_streaming`): #26478 - Slice 3 (PR gate in CircleCI): #26479 - Slice 4 (daily cron VM publishes matrix to docs): #26480 - Slice 5 (full v0 row set: 6 features × 5 providers): #26481 ## Pre-Submission checklist **Please complete all items before asking a LiteLLM maintainer to review your PR** - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory, **Adding at least 1 test is a hard requirement** - [see details](https://docs.litellm.ai/docs/extras/contributing_code) - Note: tests for this feature live under `tests/claude_code/_driver_unit_tests/`, `tests/claude_code/_builder_unit_tests/`, `tests/claude_code/_publisher_unit_tests/`, and `tests/claude_code/_pr_gate_unit_tests/` — these are deep-module unit tests for the new helpers (Claude Code CLI Driver, Matrix JSON Builder, Publisher, PR-Gate Version Resolver) per the PRD's "Testing Decisions" section. They follow the same mocked-subprocess / golden-file patterns established in `tests/test_litellm/`. - [x] My PR passes all unit tests on [`make test-unit`](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR's scope is as isolated as possible, it only solves 1 specific problem - [ ] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Delays in PR merge? If you're seeing a delay in your PR being merged, ping the LiteLLM Team on [Slack (#pr-review)](https://join.slack.com/t/litellmossslack/shared_invite/zt-3o7nkuyfr-p_kbNJj8taRfXGgQI1~YyA). ## CI (LiteLLM team) > **CI status guideline:** > > - 50-55 passing tests: main is stable with minor issues. > - 45-49 passing tests: acceptable but needs attention > - <= 40 passing tests: unstable; be careful with your merges and assess the risk. - [ ] **Branch creation CI run** Link: - [ ] **CI run for the last commit** Link: - [ ] **Merge / cherry-pick CI run** Links: ## Screenshots / Proof of Fix This PR ships the end-to-end v0 of the Claude Code Compatibility Matrix as defined in PRD #26476. Verification of the pipeline: **1. Test scaffolding (slices 1, 2, 5).** New layout under `tests/claude_code/ /test_ .py`. The `compat_result` pytest fixture captures tagged-union outcomes (`pass` / `fail` / `not_applicable` / `not_tested`); a `conftest.py` hook merges per-test results into a structured `compat-results.json` artifact. Six features × five providers = 30 cells, each exercised against three Claude tiers (Haiku 4.5 / Sonnet 4.6 / Opus 4.7), all-must-pass aggregation per cell. **2. Claude Code CLI Driver + Matrix JSON Builder.** Two deep helper modules (`tests/claude_code/cli_driver.py`, `tests/claude_code/matrix_builder.py`) wrapping subprocess + parsing and pure-function JSON construction respectively. Unit-tested against mocked subprocess (driver) and golden fixtures (builder) — see `_driver_unit_tests/` and `_builder_unit_tests/`. **3. PR Gate (slice 3).** New CircleCI job `claude_code_compat_pr_gate` boots the proxy from the PR's code, installs the `claude` CLI at the version returned by the new PR-gate version resolver (newest published >= 3 days ago, queried at run-time from the npm registry), and runs the full `tests/claude_code/` suite. Red status blocks merge. **4. Daily Cron Publisher (slice 4).** New GitHub Actions workflow `.github/workflows/claude_code_compat_matrix.yml` runs on three triggers (daily cron, `release.published` filtered to `v*-stable`, `workflow_dispatch`). Resolves the latest stable LiteLLM release via the GitHub Releases API, pulls the corresponding ghcr.io image, installs the latest Claude Code CLI, runs the test suite, builds the matrix JSON, and direct-pushes it to `BerriAI/litellm-docs`. Cross-repo authentication uses a GitHub App scoped to `contents: write` on the docs repo only; the `select_files_to_commit` allowlist enforces "only `compatibility-matrix.json` is ever pushed" since GitHub Apps cannot scope tokens to a single file path. **5. Sample matrix output.** `tests/claude_code/sample_compatibility-matrix.json` shows the expected v1 schema shape that the docs site's ` ` React component will consume. **Secret scan.** Verified no committed secrets: - All real credentials are loaded via `os.environ.get(...)` or `${{ secrets.* }}`. - Test fixtures use ob

litellm2026-04-25 03:35:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26478•Fetched 2026-04-26 05:06:50

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mateo-berri

Participants

mateo-berri

Timeline (top)

cross-referenced ×7commented ×2labeled ×2closed ×1

Error Message

Update the hand-authored compatibility-matrix.json in the docs repo so the rendered matrix shows a 1×5 grid with the full column set. This validates that the <CompatibilityMatrix /> component renders all four status states correctly when given real data, and validates that the Matrix JSON Builder correctly aggregates per-model results into a single per-cell status (cell is pass only if all three models pass; otherwise fail with the failing model identified in the error message).

When any model fails, the cell reports {"status": "fail", "error": "..."} and the error string identifies which model broke.

Fix Action

Fixed

Fixed by PR: [WIP] feat(tests): Claude Code Compatibility Matrix v0 (PRD #26476) (https://github.com/BerriAI/litellm/pull/26491)

PR fix notes

PR #26491: [WIP] feat(tests): Claude Code Compatibility Matrix v0 (PRD #26476)

Repository: BerriAI/litellm
Author: mateo-berri
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26491

Description (problem / solution / changelog)

Relevant issues

Implements the v0 of the Claude Code Compatibility Matrix.

Parent PRD: #26476
Slice 1 (tracer bullet): #26477
Slice 2 (4 provider columns for basic_messaging_non_streaming): #26478
Slice 3 (PR gate in CircleCI): #26479
Slice 4 (daily cron VM publishes matrix to docs): #26480
Slice 5 (full v0 row set: 6 features × 5 providers): #26481

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
- Note: tests for this feature live under tests/claude_code/_driver_unit_tests/, tests/claude_code/_builder_unit_tests/, tests/claude_code/_publisher_unit_tests/, and tests/claude_code/_pr_gate_unit_tests/ — these are deep-module unit tests for the new helpers (Claude Code CLI Driver, Matrix JSON Builder, Publisher, PR-Gate Version Resolver) per the PRD's "Testing Decisions" section. They follow the same mocked-subprocess / golden-file patterns established in tests/test_litellm/.
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run Link:
CI run for the last commit Link:
Merge / cherry-pick CI run Links:

Screenshots / Proof of Fix

This PR ships the end-to-end v0 of the Claude Code Compatibility Matrix as defined in PRD #26476. Verification of the pipeline:

1. Test scaffolding (slices 1, 2, 5). New layout under tests/claude_code/<feature>/test_<provider>.py. The compat_result pytest fixture captures tagged-union outcomes (pass / fail / not_applicable / not_tested); a conftest.py hook merges per-test results into a structured compat-results.json artifact. Six features × five providers = 30 cells, each exercised against three Claude tiers (Haiku 4.5 / Sonnet 4.6 / Opus 4.7), all-must-pass aggregation per cell.

2. Claude Code CLI Driver + Matrix JSON Builder. Two deep helper modules (tests/claude_code/cli_driver.py, tests/claude_code/matrix_builder.py) wrapping subprocess + parsing and pure-function JSON construction respectively. Unit-tested against mocked subprocess (driver) and golden fixtures (builder) — see _driver_unit_tests/ and _builder_unit_tests/.

3. PR Gate (slice 3). New CircleCI job claude_code_compat_pr_gate boots the proxy from the PR's code, installs the claude CLI at the version returned by the new PR-gate version resolver (newest published >= 3 days ago, queried at run-time from the npm registry), and runs the full tests/claude_code/ suite. Red status blocks merge.

4. Daily Cron Publisher (slice 4). New GitHub Actions workflow .github/workflows/claude_code_compat_matrix.yml runs on three triggers (daily cron, release.published filtered to v*-stable, workflow_dispatch). Resolves the latest stable LiteLLM release via the GitHub Releases API, pulls the corresponding ghcr.io image, installs the latest Claude Code CLI, runs the test suite, builds the matrix JSON, and direct-pushes it to BerriAI/litellm-docs. Cross-repo authentication uses a GitHub App scoped to contents: write on the docs repo only; the select_files_to_commit allowlist enforces "only compatibility-matrix.json is ever pushed" since GitHub Apps cannot scope tokens to a single file path.

5. Sample matrix output. tests/claude_code/sample_compatibility-matrix.json shows the expected v1 schema shape that the docs site's <CompatibilityMatrix /> React component will consume.

Secret scan. Verified no committed secrets:

All real credentials are loaded via os.environ.get(...) or ${{ secrets.* }}.
Test fixtures use obvious placeholders (sk-test, sk-abc, "k", ghs_xxx).
sk-1234 and sk-cron-matrix are dev master keys used only inside ephemeral test/cron containers (consistent with existing CI conventions in .circleci/config.yml).
pathrise-convert-1606954137718 is the standard GCP test project ID already used throughout the LiteLLM test suite (a project ID is not a credential).
.gitignore excludes the CI-output files (compat-results.json, compatibility-matrix.json).
Workflow uses SHA-pinned actions, permissions: contents: read, and persist-credentials: false on checkout.

Type

🆕 New Feature 🚄 Infrastructure ✅ Test

Changes

tests/claude_code/manifest.yaml — single source of truth for the matrix's row order and provider column order.
tests/claude_code/<feature>/test_<provider>.py — 30 per-(feature, provider) test files, one feature directory each for basic_messaging_non_streaming, basic_messaging_streaming, tool_use, prompt_caching_5m, vision, extended_thinking.
tests/claude_code/conftest.py — compat_result fixture and pytest_runtest_logreport hook that emits the structured compat-results.json artifact.
tests/claude_code/cli_driver.py — Claude Code CLI Driver (deep module wrapping subprocess + stream-JSON parsing).
tests/claude_code/matrix_builder.py — pure-function builder that turns the per-test results artifact into the published compatibility-matrix.json per the v1 schema.
tests/claude_code/resolver.py — Latest Stable LiteLLM Resolver (queries the GitHub Releases API for newest v*-stable).
tests/claude_code/pr_gate_version_resolver.py — Claude Code PR-Gate Version Resolver (queries npm for newest version published >= 3 days ago).
tests/claude_code/publisher.py — daily-cron publisher orchestrator: resolves versions, runs the test suite, builds JSON, direct-pushes to the docs repo. Includes the select_files_to_commit allowlist enforcement.
tests/claude_code/test_config.yaml — proxy routing config for the PR gate, mapping aliases to upstream models per provider.
tests/claude_code/_*_unit_tests/ — unit tests for the four deep modules.
.github/workflows/claude_code_compat_matrix.yml — daily cron workflow.
.circleci/config.yml — new claude_code_compat_pr_gate job wired into the existing main-branches workflow.
.gitignore — exclude CI-output files (compat-results.json, compatibility-matrix.json).

Out of scope for this PR (per PRD's "Deferred to v1+"): the docs-side React <CompatibilityMatrix /> component, MDX page at docs/tutorials/claude-code-compatibility, Slack regression alerts, operational guardrails (deadman alerts, staleness banner), additional features beyond the v0 row set, PR-comment diff commenter, click-to-modal cell deep-dive, and a written ADR artifact.

Changed files

.circleci/config.yml (modified, +96/-0)
.github/workflows/claude_code_compat_matrix.yml (added, +127/-0)
.gitignore (modified, +6/-1)
tests/claude_code/__init__.py (added, +0/-0)
tests/claude_code/_builder_unit_tests/__init__.py (added, +0/-0)
tests/claude_code/_builder_unit_tests/fixtures/expected_matrix.json (added, +38/-0)
tests/claude_code/_builder_unit_tests/fixtures/manifest.yaml (added, +9/-0)
tests/claude_code/_builder_unit_tests/fixtures/results.json (added, +41/-0)
tests/claude_code/_builder_unit_tests/test_matrix_builder.py (added, +282/-0)
tests/claude_code/_builder_unit_tests/test_v0_layout.py (added, +116/-0)
tests/claude_code/_driver_unit_tests/__init__.py (added, +0/-0)
tests/claude_code/_driver_unit_tests/test_cli_driver.py (added, +339/-0)
tests/claude_code/_driver_unit_tests/test_compat_result.py (added, +70/-0)
tests/claude_code/_pr_gate_unit_tests/__init__.py (added, +0/-0)
tests/claude_code/_pr_gate_unit_tests/test_circleci_pr_gate_wiring.py (added, +131/-0)
tests/claude_code/_pr_gate_unit_tests/test_pr_gate_version_resolver.py (added, +139/-0)
tests/claude_code/_publisher_unit_tests/__init__.py (added, +0/-0)
tests/claude_code/_publisher_unit_tests/test_publisher.py (added, +99/-0)
tests/claude_code/_publisher_unit_tests/test_resolver.py (added, +112/-0)
tests/claude_code/basic_messaging_non_streaming/__init__.py (added, +0/-0)
tests/claude_code/basic_messaging_non_streaming/test_anthropic.py (added, +103/-0)
tests/claude_code/basic_messaging_non_streaming/test_azure.py (added, +107/-0)
tests/claude_code/basic_messaging_non_streaming/test_bedrock_converse.py (added, +96/-0)
tests/claude_code/basic_messaging_non_streaming/test_bedrock_invoke.py (added, +96/-0)
tests/claude_code/basic_messaging_non_streaming/test_vertex_ai.py (added, +96/-0)
tests/claude_code/basic_messaging_streaming/__init__.py (added, +0/-0)
tests/claude_code/basic_messaging_streaming/test_anthropic.py (added, +106/-0)
tests/claude_code/basic_messaging_streaming/test_azure.py (added, +103/-0)
tests/claude_code/basic_messaging_streaming/test_bedrock_converse.py (added, +99/-0)
tests/claude_code/basic_messaging_streaming/test_bedrock_invoke.py (added, +99/-0)
tests/claude_code/basic_messaging_streaming/test_vertex_ai.py (added, +99/-0)
tests/claude_code/cli_driver.py (added, +261/-0)
tests/claude_code/conftest.py (added, +164/-0)
tests/claude_code/extended_thinking/__init__.py (added, +0/-0)
tests/claude_code/extended_thinking/test_anthropic.py (added, +116/-0)
tests/claude_code/extended_thinking/test_azure.py (added, +118/-0)
tests/claude_code/extended_thinking/test_bedrock_converse.py (added, +110/-0)
tests/claude_code/extended_thinking/test_bedrock_invoke.py (added, +110/-0)
tests/claude_code/extended_thinking/test_vertex_ai.py (added, +110/-0)
tests/claude_code/manifest.yaml (added, +36/-0)
tests/claude_code/matrix_builder.py (added, +179/-0)
tests/claude_code/pr_gate_version_resolver.py (added, +148/-0)
tests/claude_code/prompt_caching_5m/__init__.py (added, +0/-0)
tests/claude_code/prompt_caching_5m/test_anthropic.py (added, +113/-0)
tests/claude_code/prompt_caching_5m/test_azure.py (added, +111/-0)
tests/claude_code/prompt_caching_5m/test_bedrock_converse.py (added, +104/-0)
tests/claude_code/prompt_caching_5m/test_bedrock_invoke.py (added, +104/-0)
tests/claude_code/prompt_caching_5m/test_vertex_ai.py (added, +104/-0)
tests/claude_code/publisher.py (added, +388/-0)
tests/claude_code/resolver.py (added, +91/-0)
tests/claude_code/sample_compatibility-matrix.json (added, +141/-0)
tests/claude_code/test_config.yaml (added, +103/-0)
tests/claude_code/tool_use/__init__.py (added, +0/-0)
tests/claude_code/tool_use/test_anthropic.py (added, +114/-0)
tests/claude_code/tool_use/test_azure.py (added, +113/-0)
tests/claude_code/tool_use/test_bedrock_converse.py (added, +109/-0)
tests/claude_code/tool_use/test_bedrock_invoke.py (added, +109/-0)
tests/claude_code/tool_use/test_vertex_ai.py (added, +109/-0)
tests/claude_code/vision/__init__.py (added, +0/-0)
tests/claude_code/vision/test_anthropic.py (added, +99/-0)
tests/claude_code/vision/test_azure.py (added, +100/-0)
tests/claude_code/vision/test_bedrock_converse.py (added, +95/-0)
tests/claude_code/vision/test_bedrock_invoke.py (added, +95/-0)
tests/claude_code/vision/test_vertex_ai.py (added, +95/-0)

RAW_BUFFERClick to expand / collapse

Parent PRD

#26476

What to build

Extend the tracer-bullet cell from slice #26477 across all four remaining provider columns for the same feature (basic_messaging_non_streaming). This proves the multi-provider, multi-model, all-must-pass aggregation logic described in the PRD's "Per-cell model coverage" section.

Concretely, add test_bedrock_invoke.py, test_bedrock_converse.py, test_vertex_ai.py, and test_azure.py for the existing feature directory. Each test exercises the feature against Claude Haiku 4.5, Claude Sonnet 4.6, and Claude Opus 4.7. Where a (provider, model) combination is genuinely unavailable, the test reports not_applicable with a human-readable reason.

Acceptance criteria

Per-provider test files exist for Bedrock Invoke, Bedrock Converse, Vertex AI, and Azure for basic_messaging_non_streaming.
Each test file exercises the feature against Haiku 4.5, Sonnet 4.6, and Opus 4.7 (or reports not_applicable for combinations that don't exist).
When all three models pass, the cell reports {"status": "pass"}.
When any model fails, the cell reports {"status": "fail", "error": "..."} and the error string identifies which model broke.
When the (provider, model) combination is unsupported, the test reports {"status": "not_applicable", "reason": "..."} with a reason fit to display in a tooltip.
The Matrix JSON Builder correctly aggregates per-model results into per-cell status without losing the failing-model identification.
The hand-authored compatibility-matrix.json in the docs repo is updated to a 1×5 grid reflecting real test outputs.
The rendered docs page shows colored cells for all four status states correctly (green, red, gray, yellow), with appropriate tooltips on hover.

Blocked by

Blocked by #26477

User stories addressed

Reference by number from the parent PRD:

User story 3
User story 17
User story 20
User story 22
User story 23

extent analysis

TL;DR

To resolve the issue, extend the tracer-bullet cell across all four remaining provider columns for the basic_messaging_non_streaming feature and update the compatibility-matrix.json file to reflect the new test outputs.

Guidance

Create test files for Bedrock Invoke, Bedrock Converse, Vertex AI, and Azure for the basic_messaging_non_streaming feature, exercising each against Haiku 4.5, Sonnet 4.6, and Opus 4.7 models.
Update the compatibility-matrix.json file to a 1×5 grid reflecting real test outputs, ensuring the Matrix JSON Builder correctly aggregates per-model results into per-cell status.
Verify that the rendered docs page shows colored cells for all four status states correctly, with appropriate tooltips on hover.
Ensure that the cell reports the correct status (pass, fail, or not_applicable) with relevant error messages or reasons.

Notes

The solution requires careful implementation of the test files and updates to the compatibility-matrix.json file to ensure accurate aggregation of per-model results and correct rendering of the compatibility matrix.

Recommendation

Apply the workaround by extending the tracer-bullet cell and updating the compatibility-matrix.json file, as this will allow for the correct rendering of the compatibility matrix and aggregation of per-model results.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Compat Matrix] Slice 2: Add remaining 4 provider columns for basic_messaging_non_streaming [1 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #26491: [WIP] feat(tests): Claude Code Compatibility Matrix v0 (PRD #26476)

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Changed files

Parent PRD

What to build

Acceptance criteria

Blocked by

User stories addressed

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Compat Matrix] Slice 2: Add remaining 4 provider columns for basic_messaging_non_streaming [1 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #26491: [WIP] feat(tests): Claude Code Compatibility Matrix v0 (PRD #26476)

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Changed files

Parent PRD

What to build

Acceptance criteria

Blocked by

User stories addressed

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING