langchain - 💡(How to fix) Fix To support BytePlus's video functionality, implement lenient validation for the `input.type` field in OpenAI responses by checking if `input.type == input_video` to ensure compatibility with BytePlus. [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36414Fetched 2026-04-08 02:22:31
View on GitHub
Comments
1
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×3commented ×1issue_type_added ×1

I'm a developer from China building a multimodal analysis Agent using LangGraph. I'm using BytePlus (ByteDance's overseas model service) which supports text, image, and video inputs through their responses API. However, I've encountered an issue when trying to process video inputs with the langchain-openai integration.

Root Cause

The input_video block is not included in the API request because it's not recognized in the following code:

Fix Action

Solution

I've made a minimal change to support the input_video type:

def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
    # ...
    elif msg["role"] in ("user", "system", "developer"):
        if isinstance(msg["content"], list):
            new_blocks = []
            non_message_item_types = ("mcp_approval_response",)
            for block in msg["content"]:
                if block["type"] in ("text", "image_url", "file"):
                    new_blocks.append(
                        _convert_chat_completions_blocks_to_responses(block)
                    )
                elif block["type"] in ("input_text", "input_image", "input_file", "input_video"):
                    new_blocks.append(block)
                elif block["type"] in non_message_item_types:
                    input_.append(block)
                else:
                    pass
    # ...

Code Example

from langchain.messages import HumanMessage

messages = [
    HumanMessage(
        content=[
            {
                "type": "input_video",
                "video_url": video_path,
            },
            {
                "type": "input_text",
                "text": instruction,
            },
        ]
    )
]

---

def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
    # ...
    elif msg["role"] in ("user", "system", "developer"):
        if isinstance(msg["content"], list):
            new_blocks = []
            non_message_item_types = ("mcp_approval_response",)
            for block in msg["content"]:
                if block["type"] in ("text", "image_url", "file"):
                    new_blocks.append(
                        _convert_chat_completions_blocks_to_responses(block)
                    )
                elif block["type"] in ("input_text", "input_image", "input_file"):
                    new_blocks.append(block)
                elif block["type"] in non_message_item_types:
                    input_.append(block)
                else:
                    pass
    # ...

---

def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
    # ...
    elif msg["role"] in ("user", "system", "developer"):
        if isinstance(msg["content"], list):
            new_blocks = []
            non_message_item_types = ("mcp_approval_response",)
            for block in msg["content"]:
                if block["type"] in ("text", "image_url", "file"):
                    new_blocks.append(
                        _convert_chat_completions_blocks_to_responses(block)
                    )
                elif block["type"] in ("input_text", "input_image", "input_file", "input_video"):
                    new_blocks.append(block)
                elif block["type"] in non_message_item_types:
                    input_.append(block)
                else:
                    pass
    # ...
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Feature Description

Support for input_video type in responses API for BytePlus/ByteDance models

Description

I'm a developer from China building a multimodal analysis Agent using LangGraph. I'm using BytePlus (ByteDance's overseas model service) which supports text, image, and video inputs through their responses API. However, I've encountered an issue when trying to process video inputs with the langchain-openai integration.

Problem

BytePlus models support video input via the input_video type in their responses API, but the current implementation of _construct_responses_api_input in LangChain's OpenAI integration doesn't include this type, causing video inputs to be ignored.

Current Behavior

When sending a message with input_video type:

from langchain.messages import HumanMessage

messages = [
    HumanMessage(
        content=[
            {
                "type": "input_video",
                "video_url": video_path,
            },
            {
                "type": "input_text",
                "text": instruction,
            },
        ]
    )
]

The input_video block is not included in the API request because it's not recognized in the following code:

def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
    # ...
    elif msg["role"] in ("user", "system", "developer"):
        if isinstance(msg["content"], list):
            new_blocks = []
            non_message_item_types = ("mcp_approval_response",)
            for block in msg["content"]:
                if block["type"] in ("text", "image_url", "file"):
                    new_blocks.append(
                        _convert_chat_completions_blocks_to_responses(block)
                    )
                elif block["type"] in ("input_text", "input_image", "input_file"):
                    new_blocks.append(block)
                elif block["type"] in non_message_item_types:
                    input_.append(block)
                else:
                    pass
    # ...

Expected Behavior

The input_video type should be included in the API request, similar to how input_text, input_image, and input_file are handled.

Solution

I've made a minimal change to support the input_video type:

def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
    # ...
    elif msg["role"] in ("user", "system", "developer"):
        if isinstance(msg["content"], list):
            new_blocks = []
            non_message_item_types = ("mcp_approval_response",)
            for block in msg["content"]:
                if block["type"] in ("text", "image_url", "file"):
                    new_blocks.append(
                        _convert_chat_completions_blocks_to_responses(block)
                    )
                elif block["type"] in ("input_text", "input_image", "input_file", "input_video"):
                    new_blocks.append(block)
                elif block["type"] in non_message_item_types:
                    input_.append(block)
                else:
                    pass
    # ...

Context

  • BytePlus is ByteDance's overseas model service, which is gaining significant market share in China's AI model space
  • Their models offer strong multimodal capabilities for text, image, and video
  • They use the same API structure as OpenAI's responses API but extend it with video support
  • Currently, there's no dedicated LangChain integration for BytePlus/ByteDance models, so users are leveraging the OpenAI integration with custom base URLs

Impact

This change would enable users of BytePlus and potentially other model providers that support video input through similar structures to use video capabilities with LangChain's OpenAI integration.

Thank you for considering this request!

Use Case

This change would enable users of BytePlus and potentially other model providers that support video input through similar structures to use video capabilities with LangChain's OpenAI integration.

Proposed Solution

base.py

Alternatives Considered

  • BytePlus is ByteDance's overseas model service, which is gaining significant market share in China's AI model space
  • Their models offer strong multimodal capabilities for text, image, and video
  • They use the same API structure as OpenAI's responses API but extend it with video support
  • Currently, there's no dedicated LangChain integration for BytePlus/ByteDance models, so users are leveraging the OpenAI integration with custom base URLs

Additional Context

BytePlus models support video input via the input_video type in their responses API, but the current implementation of _construct_responses_api_input in LangChain's OpenAI integration doesn't include this type, causing video inputs to be ignored.

extent analysis

TL;DR

To support input_video type in responses API for BytePlus/ByteDance models, modify the _construct_responses_api_input function to include input_video in the list of accepted block types.

Guidance

  • Update the _construct_responses_api_input function to handle input_video blocks by adding "input_video" to the elif condition that checks block["type"].
  • Verify that the updated function correctly includes input_video blocks in the API request by checking the new_blocks list.
  • Test the updated function with a sample message containing an input_video block to ensure it is properly handled.
  • Consider adding additional error handling or logging to ensure that any issues with input_video blocks are properly reported.

Example

def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
    # ...
    elif msg["role] in (user", "system", "developer"):
        if isinstance(msg["content"], list):
            new_blocks = []
            non_message_item_types = ("mcp_approval_response",)
            for block in msg["content"]:
                if block["type] in (text", "image_url", "file"):
                    new_blocks.append(
                        _convert_chat_completions_blocks_to_responses(block)
                    )
                elif block["type] in (input_text", "input_image", "input_file", "input_video"):
                    new_blocks.append(block)
                elif block in non_message_item_types:
                    input_.append(block)
                else:
                    pass
    # ...

Notes

The proposed solution assumes that the input_video block has a similar structure to the existing input_text, input_image, and input_file blocks. If the input_video block has a different structure, additional modifications may be necessary.

Recommendation

Apply the workaround by modifying the _construct_responses_api_input function to include input_video in the list of accepted block types, as this will allow users to leverage the video capabilities of BytePlus and potentially other model providers.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING