dify - 💡(How to fix) Fix Voice input abnormal(audio-to-text) [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Today, I tried to integrate the voice input capability into ChatFlow using the Tongyi model. When debugging voice input on the preview page, the voice was successfully converted to text. However, after publishing the application and accessing the application page directly, the voice input failed to convert to text correctly, and an error occurred: [Tongyi] Error: Operator 'getitem' is not supported on this expression. 2. After the application is published, directly accessing the application page /chat/xxxx and inputting voice calls the interface /api/audio-to-text, which fails to convert voice to text and reports an error.

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.14.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

First of all, many thanks to all the authors in the open-source community.

Today, I tried to integrate the voice input capability into ChatFlow using the Tongyi model. When debugging voice input on the preview page, the voice was successfully converted to text. However, after publishing the application and accessing the application page directly, the voice input failed to convert to text correctly, and an error occurred: [Tongyi] Error: Operator 'getitem' is not supported on this expression.

I am not familiar with Python code and conducted a preliminary investigation with the help of AI:

  1. On the ChatFlow editing page, opening the preview voice input calls the interface /console/api/apps/xxxx/audio-to-text, which converts voice to text correctly.
  2. After the application is published, directly accessing the application page /chat/xxxx and inputting voice calls the interface /api/audio-to-text, which fails to convert voice to text and reports an error.
  3. By checking the code method, I found that in api/controllers/web/audio.py, the end_user parameter passed when calling the AudioService.transcript_asr method appears to be an object, but the method declaration does not define it as such. The method signature is: transcript_asr(cls, app_model: App, file: FileStorage, end_user: str | None = None).

Sorry, this is as far as I can investigate based on my personal experience. I currently do not have the ability to debug, modify the code, or submit a PR.

✔️ Expected Behavior

Voice input function is normal.

❌ Actual Behavior

<img width="830" height="1073" alt="Image" src="https://github.com/user-attachments/assets/d2f93654-8c42-4bb0-bc8b-399ca22038bc" />

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

dify - 💡(How to fix) Fix Voice input abnormal(audio-to-text) [1 pull requests]