litellm - ✅(Solved) Fix [Bug]: vertex_ai/gemini-3.1-flash-lite-preview returns "finish_reason": "stop" instead of "tool_calls" when using streaming [2 pull requests, 6 comments, 2 participants]

litellm2026-03-05 12:41:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#22900•Fetched 2026-04-08 00:39:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mvrodrig

Participants

Chesars

mvrodrig

Timeline (top)

mentioned ×8subscribed ×8commented ×6cross-referenced ×4

Fix Action

Fixed

Fixed by PR: feat(mcp): OAuth2 authorization-code flow for OpenAPI BYOK MCP servers (https://github.com/BerriAI/litellm/pull/22943)
Fixed by PR: feat(mcp): popular REST API gallery for OpenAPI MCPs + per-user OAuth2 connect in ChatUI (https://github.com/BerriAI/litellm/pull/23012)

PR fix notes

PR #22943: feat(mcp): OAuth2 authorization-code flow for OpenAPI BYOK MCP servers

Repository: BerriAI/litellm
Author: ishaan-jaff
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/22943

Description (problem / solution / changelog)

Relevant issues

Closes #22900 (OpenAPI MCP OAuth2 flow)

What this does

Adds a full OAuth2 authorization-code flow for OpenAPI BYOK MCP servers, so users can authorize through a provider's consent screen (GitHub, Spotify, Linear, etc.) instead of pasting a static API key.

Admin config (proxy_config.yaml):

mcp_servers:
  github:
    spec_path: https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json
    auth_type: oauth2
    is_byok: true
    authorization_url: https://github.com/login/oauth/authorize
    token_url: https://github.com/login/oauth/access_token
    client_id: <github-oauth-app-client-id>
    client_secret: <github-oauth-app-client-secret>

New backend endpoints (openapi_oauth2_endpoints.py):

GET /v1/mcp/server/{server_id}/oauth2/connect — generates a state token (HMAC-SHA256, 10-min TTL), returns authorization_url the UI opens as a popup
GET /v1/mcp/oauth2/callback — receives code+state, exchanges for access token, stores in LiteLLM_MCPUserCredentials, shows success HTML that auto-closes popups
GET /v1/mcp/server/{server_id}/oauth2/status — returns {"connected": true/false}

UI (OAuth2ConnectButton.tsx): when a BYOK server has auth_type=oauth2, the Credentials column shows a Connect button. Clicking opens a popup to the provider consent screen, polls status every 2s, flips to "Connected" on success.

Token injection: stored tokens are automatically injected as Authorization: Bearer <token> via the _request_auth_header ContextVar on every tool call.

Also fixes three bugs that prevented BYOK token injection from working:

_get_tools_from_server tried client_credentials token exchange before checking spec_path, failing for authorization-code-only providers
user_api_key_auth was None in REST tool calls, preventing DB lookup of stored token by user_id
OpenAPI tools registered under prefixed name but looked up by bare name — added fallback in execute_mcp_tool

Also fixes load_servers_from_config to propagate is_byok, byok_description, and byok_api_key_help_url from YAML config into the MCPServer object.

Pre-Submission checklist

My changes don't break any existing tests
I have added tests (unit tests in tests/test_litellm/proxy/mcp_server/)

Type

Bug fix
New feature

Changed files

litellm/__init__.py (modified, +5/-0)
litellm/proxy/_experimental/mcp_server/mcp_server_manager.py (modified, +16/-10)
litellm/proxy/_experimental/mcp_server/openapi_oauth2_endpoints.py (added, +675/-0)
litellm/proxy/_experimental/mcp_server/rest_endpoints.py (modified, +1/-1)
litellm/proxy/_experimental/mcp_server/server.py (modified, +141/-38)
litellm/proxy/proxy_server.py (modified, +7/-3)
litellm/types/mcp_server/mcp_server_manager.py (modified, +3/-0)
tests/test_litellm/proxy/_experimental/mcp_server/test_openapi_oauth2_endpoints.py (added, +973/-0)
ui/litellm-dashboard/src/components/mcp_tools/OAuth2ConnectButton.tsx (added, +154/-0)
ui/litellm-dashboard/src/components/mcp_tools/mcp_server_columns.tsx (modified, +24/-1)
ui/litellm-dashboard/src/components/mcp_tools/mcp_servers.tsx (modified, +3/-1)
ui/litellm-dashboard/src/components/networking.tsx (modified, +58/-0)

PR #23012: feat(mcp): popular REST API gallery for OpenAPI MCPs + per-user OAuth2 connect in ChatUI

Repository: BerriAI/litellm
Author: ishaan-jaff
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/23012

Description (problem / solution / changelog)

Relevant issues

Closes #22900 (related - OAuth2 for OpenAPI MCPs)

Changes

Admin — OpenAPI MCP gallery

When selecting OpenAPI Spec transport in the Add MCP Server form, a gallery of 12 popular REST APIs now appears with logos: GitHub, Figma, Jira, Confluence, Slack, Stripe, Notion, Linear, HubSpot, Salesforce, Zendesk, Snowflake.

Clicking a card pre-fills: OpenAPI spec URL, auth type, authorization URL, token URL, and default scopes. A "+ Custom OpenAPI URL" link is available for unlisted APIs.

APIs are maintained as a static list in create_mcp_server.tsx (no extra API call) and also added to mcp_registry.json under category "REST APIs".

User — ChatUI OAuth2 connect flow

In the Apps panel, OAuth2 MCP servers now show a "Sign In with <Provider>" button instead of a toggle. After the OAuth flow completes, the panel shows a green "Connected" badge and a Disconnect button.

Uses existing discoverable_endpoints.py authorize/callback flow. A ?mcp_oauth_complete=<server_id> param on return triggers a credential status refresh.

Backend fixes

Add GET /v1/mcp/user/credential/{server_id} — lets the UI check if a user has a stored credential for an MCP server
Extend has_user_credential annotation on the server list to include auth_type=oauth2 servers, not just BYOK
Fix _get_tools_from_server to check spec_path before creating MCP client — fixes GitHub/Figma tool loading (MCP client creation tried m2m auth which fails for auth-code-only providers like GitHub)
Propagate is_byok, byok_description, byok_api_key_help_url from YAML config into MCPServer when loading via load_servers_from_config

Pre-Submission checklist

My changes don't break any existing tests
I have added tests

Type

New feature
Bug fix

Changes

litellm/proxy/mcp_registry.json — 12 new REST API entries
litellm/proxy/management_endpoints/mcp_management_endpoints.py — new GET /v1/mcp/user/credential/{server_id}, extend oauth2 credential annotation
litellm/proxy/_experimental/mcp_server/mcp_server_manager.py — fix OpenAPI tool loading, propagate byok fields from config
ui/.../create_mcp_server.tsx — OpenAPI gallery with logos + pre-fill
ui/.../MCPAppsPanel.tsx — OAuth2 sign-in / connected / disconnect flow
ui/.../networking.tsx — checkMCPUserCredential, deleteMCPUserCredential
ui/.../types.tsx — extend DiscoverableMCPServer with OpenAPI fields

Changed files

litellm/proxy/_experimental/mcp_server/mcp_server_manager.py (modified, +13/-10)
litellm/proxy/management_endpoints/mcp_management_endpoints.py (modified, +31/-5)
litellm/proxy/mcp_registry.json (modified, +156/-0)
ui/litellm-dashboard/src/components/chat/MCPAppsPanel.tsx (modified, +187/-58)
ui/litellm-dashboard/src/components/mcp_tools/create_mcp_server.tsx (modified, +369/-102)
ui/litellm-dashboard/src/components/mcp_tools/types.tsx (modified, +6/-0)
ui/litellm-dashboard/src/components/networking.tsx (modified, +30/-0)
ui/litellm-dashboard/src/hooks/useTestMCPConnection.tsx (modified, +4/-2)

Code Example

curl --request POST \
  --url http://localhost:4000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
    "stream": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What'\''s the weather like in Lima, Peru today? celsius"
      }
    ],
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Retrieve current weather for a specific location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and country, e.g., Lima, Peru"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "stream_options": {
      "include_usage": true
    }
  }'

---

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":142,"prompt_tokens":66,"total_tokens":208,"completion_tokens_details":{"reasoning_tokens":117,"text_tokens":25},"prompt_tokens_details":{"text_tokens":66}}}

data: [DONE]

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Hi @krrishdholakia, @ishaan-jaff, @Chesars !

This is the same issue reported in #21041, #12249 and others: when using streaming with function tools, the final chunk ends with "finish_reason": "stop" instead of "tool_calls". This breaks agentic workflows that rely on detecting tool call completions. This time the affected model is:

vertex_ai/gemini-3.1-flash-lite-preview

Steps to Reproduce

Test with the following curl:

curl --request POST \
  --url http://localhost:4000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
    "stream": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What'\''s the weather like in Lima, Peru today? celsius"
      }
    ],
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Retrieve current weather for a specific location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and country, e.g., Lima, Peru"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "stream_options": {
      "include_usage": true
    }
  }'

Expected behavior The final streaming chunk should return "finish_reason": "tool_calls" when the model decides to invoke a tool.

Actual behavior The final chunk returns "finish_reason": "stop", even though the model is clearly attempting to use tool calls. This prevents agentic frameworks from detecting tool call completions and correctly invoking the functions.

Thanks!

Relevant log output

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":142,"prompt_tokens":66,"total_tokens":208,"completion_tokens_details":{"reasoning_tokens":117,"text_tokens":25},"prompt_tokens_details":{"text_tokens":66}}}

data: [DONE]

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

stable v1.81.12

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue of the final chunk returning "finish_reason": "stop" instead of "finish_reason": "tool_calls", we need to modify the streaming logic to correctly handle tool calls.

Step-by-Step Solution

Update the streaming.py file: Modify the handle_tool_call function to set the finish_reason to "tool_calls" when a tool is invoked.

def handle_tool_call(self, tool_name, tool_output):
    # ...
    self.finish_reason = "tool_calls"
    # ...

Modify the completion.py file: Update the generate_chunk function to include the finish_reason in the chunk output.

def generate_chunk(self, chunk_data):
    # ...
    chunk_data["finish_reason"] = self.finish_reason
    # ...

Update the proxy.py file: Modify the handle_streaming_request function to correctly handle the stream_options and include the finish_reason in the response.

def handle_streaming_request(self, request):
    # ...
    if "include_usage" in request["stream_options"]:
        # ...
        response["finish_reason"] = self.finish_reason
    # ...

Verification

To verify the fix, run the provided curl command and check the final chunk output. The finish_reason should now be "tool_calls" instead of "stop".

Extra Tips

Make sure to update the LiteLLM version to the latest stable release.
Test the fix with different models and tool calls to ensure the issue is fully resolved.
Consider adding additional logging to track tool call completions and invocation errors.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #model compatibility #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: vertex_ai/gemini-3.1-flash-lite-preview returns "finish_reason": "stop" instead of "tool_calls" when using streaming [2 pull requests, 6 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #22943: feat(mcp): OAuth2 authorization-code flow for OpenAPI BYOK MCP servers

Description (problem / solution / changelog)

Relevant issues

What this does

Pre-Submission checklist

Type

Changed files

PR #23012: feat(mcp): popular REST API gallery for OpenAPI MCPs + per-user OAuth2 connect in ChatUI

Description (problem / solution / changelog)

Relevant issues

Changes

Pre-Submission checklist

Type

Changes

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING