litellm - ✅(Solved) Fix [Bug]: vertex_ai/gemini-3.1-flash-lite-preview returns "finish_reason": "stop" instead of "tool_calls" when using streaming [2 pull requests, 6 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#22900Fetched 2026-04-08 00:39:25
View on GitHub
Comments
6
Participants
2
Timeline
32
Reactions
0
Author
Participants
Timeline (top)
mentioned ×8subscribed ×8commented ×6cross-referenced ×4

Fix Action

Fixed

PR fix notes

PR #22943: feat(mcp): OAuth2 authorization-code flow for OpenAPI BYOK MCP servers

Description (problem / solution / changelog)

Relevant issues

Closes #22900 (OpenAPI MCP OAuth2 flow)

What this does

Adds a full OAuth2 authorization-code flow for OpenAPI BYOK MCP servers, so users can authorize through a provider's consent screen (GitHub, Spotify, Linear, etc.) instead of pasting a static API key.

Admin config (proxy_config.yaml):

mcp_servers:
  github:
    spec_path: https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json
    auth_type: oauth2
    is_byok: true
    authorization_url: https://github.com/login/oauth/authorize
    token_url: https://github.com/login/oauth/access_token
    client_id: <github-oauth-app-client-id>
    client_secret: <github-oauth-app-client-secret>

New backend endpoints (openapi_oauth2_endpoints.py):

  • GET /v1/mcp/server/{server_id}/oauth2/connect — generates a state token (HMAC-SHA256, 10-min TTL), returns authorization_url the UI opens as a popup
  • GET /v1/mcp/oauth2/callback — receives code+state, exchanges for access token, stores in LiteLLM_MCPUserCredentials, shows success HTML that auto-closes popups
  • GET /v1/mcp/server/{server_id}/oauth2/status — returns {"connected": true/false}

UI (OAuth2ConnectButton.tsx): when a BYOK server has auth_type=oauth2, the Credentials column shows a Connect button. Clicking opens a popup to the provider consent screen, polls status every 2s, flips to "Connected" on success.

Token injection: stored tokens are automatically injected as Authorization: Bearer <token> via the _request_auth_header ContextVar on every tool call.

Also fixes three bugs that prevented BYOK token injection from working:

  1. _get_tools_from_server tried client_credentials token exchange before checking spec_path, failing for authorization-code-only providers
  2. user_api_key_auth was None in REST tool calls, preventing DB lookup of stored token by user_id
  3. OpenAPI tools registered under prefixed name but looked up by bare name — added fallback in execute_mcp_tool

Also fixes load_servers_from_config to propagate is_byok, byok_description, and byok_api_key_help_url from YAML config into the MCPServer object.

Pre-Submission checklist

  • My changes don't break any existing tests
  • I have added tests (unit tests in tests/test_litellm/proxy/mcp_server/)

Type

  • Bug fix
  • New feature

Changed files

  • litellm/__init__.py (modified, +5/-0)
  • litellm/proxy/_experimental/mcp_server/mcp_server_manager.py (modified, +16/-10)
  • litellm/proxy/_experimental/mcp_server/openapi_oauth2_endpoints.py (added, +675/-0)
  • litellm/proxy/_experimental/mcp_server/rest_endpoints.py (modified, +1/-1)
  • litellm/proxy/_experimental/mcp_server/server.py (modified, +141/-38)
  • litellm/proxy/proxy_server.py (modified, +7/-3)
  • litellm/types/mcp_server/mcp_server_manager.py (modified, +3/-0)
  • tests/test_litellm/proxy/_experimental/mcp_server/test_openapi_oauth2_endpoints.py (added, +973/-0)
  • ui/litellm-dashboard/src/components/mcp_tools/OAuth2ConnectButton.tsx (added, +154/-0)
  • ui/litellm-dashboard/src/components/mcp_tools/mcp_server_columns.tsx (modified, +24/-1)
  • ui/litellm-dashboard/src/components/mcp_tools/mcp_servers.tsx (modified, +3/-1)
  • ui/litellm-dashboard/src/components/networking.tsx (modified, +58/-0)

PR #23012: feat(mcp): popular REST API gallery for OpenAPI MCPs + per-user OAuth2 connect in ChatUI

Description (problem / solution / changelog)

Relevant issues

Closes #22900 (related - OAuth2 for OpenAPI MCPs)

Changes

Admin — OpenAPI MCP gallery

When selecting OpenAPI Spec transport in the Add MCP Server form, a gallery of 12 popular REST APIs now appears with logos: GitHub, Figma, Jira, Confluence, Slack, Stripe, Notion, Linear, HubSpot, Salesforce, Zendesk, Snowflake.

Clicking a card pre-fills: OpenAPI spec URL, auth type, authorization URL, token URL, and default scopes. A "+ Custom OpenAPI URL" link is available for unlisted APIs.

APIs are maintained as a static list in create_mcp_server.tsx (no extra API call) and also added to mcp_registry.json under category "REST APIs".

User — ChatUI OAuth2 connect flow

In the Apps panel, OAuth2 MCP servers now show a "Sign In with <Provider>" button instead of a toggle. After the OAuth flow completes, the panel shows a green "Connected" badge and a Disconnect button.

Uses existing discoverable_endpoints.py authorize/callback flow. A ?mcp_oauth_complete=<server_id> param on return triggers a credential status refresh.

Backend fixes

  • Add GET /v1/mcp/user/credential/{server_id} — lets the UI check if a user has a stored credential for an MCP server
  • Extend has_user_credential annotation on the server list to include auth_type=oauth2 servers, not just BYOK
  • Fix _get_tools_from_server to check spec_path before creating MCP client — fixes GitHub/Figma tool loading (MCP client creation tried m2m auth which fails for auth-code-only providers like GitHub)
  • Propagate is_byok, byok_description, byok_api_key_help_url from YAML config into MCPServer when loading via load_servers_from_config

Pre-Submission checklist

  • My changes don't break any existing tests
  • I have added tests

Type

  • New feature
  • Bug fix

Changes

  • litellm/proxy/mcp_registry.json — 12 new REST API entries
  • litellm/proxy/management_endpoints/mcp_management_endpoints.py — new GET /v1/mcp/user/credential/{server_id}, extend oauth2 credential annotation
  • litellm/proxy/_experimental/mcp_server/mcp_server_manager.py — fix OpenAPI tool loading, propagate byok fields from config
  • ui/.../create_mcp_server.tsx — OpenAPI gallery with logos + pre-fill
  • ui/.../MCPAppsPanel.tsx — OAuth2 sign-in / connected / disconnect flow
  • ui/.../networking.tsxcheckMCPUserCredential, deleteMCPUserCredential
  • ui/.../types.tsx — extend DiscoverableMCPServer with OpenAPI fields

Changed files

  • litellm/proxy/_experimental/mcp_server/mcp_server_manager.py (modified, +13/-10)
  • litellm/proxy/management_endpoints/mcp_management_endpoints.py (modified, +31/-5)
  • litellm/proxy/mcp_registry.json (modified, +156/-0)
  • ui/litellm-dashboard/src/components/chat/MCPAppsPanel.tsx (modified, +187/-58)
  • ui/litellm-dashboard/src/components/mcp_tools/create_mcp_server.tsx (modified, +369/-102)
  • ui/litellm-dashboard/src/components/mcp_tools/types.tsx (modified, +6/-0)
  • ui/litellm-dashboard/src/components/networking.tsx (modified, +30/-0)
  • ui/litellm-dashboard/src/hooks/useTestMCPConnection.tsx (modified, +4/-2)

Code Example

curl --request POST \
  --url http://localhost:4000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
    "stream": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What'\''s the weather like in Lima, Peru today? celsius"
      }
    ],
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Retrieve current weather for a specific location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and country, e.g., Lima, Peru"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "stream_options": {
      "include_usage": true
    }
  }'

---

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":142,"prompt_tokens":66,"total_tokens":208,"completion_tokens_details":{"reasoning_tokens":117,"text_tokens":25},"prompt_tokens_details":{"text_tokens":66}}}

data: [DONE]
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Hi @krrishdholakia, @ishaan-jaff, @Chesars !

This is the same issue reported in #21041, #12249 and others: when using streaming with function tools, the final chunk ends with "finish_reason": "stop" instead of "tool_calls". This breaks agentic workflows that rely on detecting tool call completions. This time the affected model is:

vertex_ai/gemini-3.1-flash-lite-preview

Steps to Reproduce

  1. Test with the following curl:
curl --request POST \
  --url http://localhost:4000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
    "stream": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What'\''s the weather like in Lima, Peru today? celsius"
      }
    ],
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Retrieve current weather for a specific location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and country, e.g., Lima, Peru"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "stream_options": {
      "include_usage": true
    }
  }'

Expected behavior The final streaming chunk should return "finish_reason": "tool_calls" when the model decides to invoke a tool.

Actual behavior The final chunk returns "finish_reason": "stop", even though the model is clearly attempting to use tool calls. This prevents agentic frameworks from detecting tool call completions and correctly invoking the functions.

Thanks!

Relevant log output

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":142,"prompt_tokens":66,"total_tokens":208,"completion_tokens_details":{"reasoning_tokens":117,"text_tokens":25},"prompt_tokens_details":{"text_tokens":66}}}

data: [DONE]

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

stable v1.81.12

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue of the final chunk returning "finish_reason": "stop" instead of "finish_reason": "tool_calls", we need to modify the streaming logic to correctly handle tool calls.

Step-by-Step Solution

  1. Update the streaming.py file: Modify the handle_tool_call function to set the finish_reason to "tool_calls" when a tool is invoked.
def handle_tool_call(self, tool_name, tool_output):
    # ...
    self.finish_reason = "tool_calls"
    # ...
  1. Modify the completion.py file: Update the generate_chunk function to include the finish_reason in the chunk output.
def generate_chunk(self, chunk_data):
    # ...
    chunk_data["finish_reason"] = self.finish_reason
    # ...
  1. Update the proxy.py file: Modify the handle_streaming_request function to correctly handle the stream_options and include the finish_reason in the response.
def handle_streaming_request(self, request):
    # ...
    if "include_usage" in request["stream_options"]:
        # ...
        response["finish_reason"] = self.finish_reason
    # ...

Verification

To verify the fix, run the provided curl command and check the final chunk output. The finish_reason should now be "tool_calls" instead of "stop".

Extra Tips

  • Make sure to update the LiteLLM version to the latest stable release.
  • Test the fix with different models and tool calls to ensure the issue is fully resolved.
  • Consider adding additional logging to track tool call completions and invocation errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: vertex_ai/gemini-3.1-flash-lite-preview returns "finish_reason": "stop" instead of "tool_calls" when using streaming [2 pull requests, 6 comments, 2 participants]