vllm - 💡(How to fix) Fix Analyze middleware traces: OWUI sampling profile comparison [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36513Fetched 2026-04-08 00:36:28
View on GitHub
Comments
1
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1

4 OWUI sampling profiles are being tested via Roo scheduler tasks on Qwen3.5-35B-A3B (AWQ 4-bit, GPUs 0,1):

Profiletemppresence_penaltytop_ptop_kthinking
Qwen_think1.01.50.9520yes
Qwen_think-code0.60.00.9520yes
Qwen_think-reason1.02.01.040yes
Qwen_instruct0.71.50.820no

Root Cause

4 OWUI sampling profiles are being tested via Roo scheduler tasks on Qwen3.5-35B-A3B (AWQ 4-bit, GPUs 0,1):

Profiletemppresence_penaltytop_ptop_kthinking
Qwen_think1.01.50.9520yes
Qwen_think-code0.60.00.9520yes
Qwen_think-reason1.02.01.040yes
Qwen_instruct0.71.50.820no

Code Example

docker exec myia_vllm-medium-qwen35-moe bash -c 'cat /logs/chat_completions.jsonl' > middleware_logs.jsonl
RAW_BUFFERClick to expand / collapse

Context

4 OWUI sampling profiles are being tested via Roo scheduler tasks on Qwen3.5-35B-A3B (AWQ 4-bit, GPUs 0,1):

Profiletemppresence_penaltytop_ptop_kthinking
Qwen_think1.01.50.9520yes
Qwen_think-code0.60.00.9520yes
Qwen_think-reason1.02.01.040yes
Qwen_instruct0.71.50.820no

Data source

Middleware logs at /logs/chat_completions.jsonl in the vLLM container (~3500 entries, 6MB).

Each entry contains: timestamp, model, prompt_tokens, completion_tokens, ttft_s, e2e_s, temperature, presence_penalty, top_p, top_k, repetition_penalty, tools_count, response_text, reasoning_text, finish_reason, system_prompt_length, last_user_message.

Analysis tasks

  1. Extract & classify requests by sampling profile (group by temp+pp+top_p+top_k signature)
  2. Repetition metrics per profile:
    • 4-gram / 8-gram repetition rate
    • Type-Token Ratio (TTR)
    • Repeated line ratio
  3. Performance metrics per profile:
    • Decode speed (completion_tokens / e2e_s)
    • TTFT distribution
    • Token count distribution
  4. Quality assessment (manual sample):
    • Coherence and relevance of responses
    • Language mixing (Chinese in French responses)
    • Code quality for coding tasks
  5. Recommendation: which profile(s) to keep for production Roo usage

How to extract logs

docker exec myia_vllm-medium-qwen35-moe bash -c 'cat /logs/chat_completions.jsonl' > middleware_logs.jsonl

Expected output

A report with per-profile metrics and a recommendation for the default Roo sampling config.

🤖 Generated with Claude Code

extent analysis

Fix Plan

To address the issue, we need to create a script that extracts and analyzes the logs from the vLLM container. The script will perform the following tasks:

  • Extract logs from the container
  • Parse the logs and group requests by sampling profile
  • Calculate repetition metrics and performance metrics for each profile
  • Provide a recommendation for the default Roo sampling config

Step-by-Step Solution

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix Analyze middleware traces: OWUI sampling profile comparison [1 comments, 1 participants]