openclaw - 💡(How to fix) Fix [Feature]: Outbound Task Calls — Agent makes phone calls on user's behalf (reservations, inquiries, flight changes) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59245Fetched 2026-04-08 02:26:58
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
commented ×1

Code Example

┌─────────────────────────────────────────────────────────┐
OpenClaw Gateway│                                                          │
│  ┌──────────┐    ┌──────────────┐    ┌───────────────┐  │
│  │  Agent    │◄──►│  Task Call    │◄──►│  Telephony    │  │
  (LLM)   │    │  Orchestrator │    │  Bridge       │  │
│  └────┬─────┘    └──────┬───────┘    └───────┬───────┘  │
│       │                 │                     │          │
│       │    ┌────────────┴────────────┐        │          │
│       │    │  Call State Machine     │        │          │
│       │    │  ┌─────┐ ┌──────┐      │        │          │
│       │    │  │ IVR │ │ Hold │      │        │          │
│       │    │  │ Nav │ │ Det.       │        │          │
│       │    │  └─────┘ └──────┘      │        │          │
│       │    │  ┌──────────────┐      │        │          │
│       │    │  │  Escalation  │──────┼──►Chat │          │
│       │    │  │  Manager     │      │   Msg  │          │
│       │    │  └──────────────┘      │        │          │
│       │    └────────────────────────┘        │          │
│       │                                      │          │
│  ┌────┴──────────┐                 ┌─────────┴───────┐  │
│  │ Natural Voice │                 │ Twilio/Telnyx   │  │
│  │ TTS Engine    │                 │ SIP/PSTN        │  │
 (ElevenLabs/  │                 │ Gateway         │  │
│  │  OpenAI RT)   │                 └─────────┬───────┘  │
│  └───────────────┘                           │          │
└──────────────────────────────────────────────┼──────────┘
                                     ┌─────────┴───────┐
PSTN / Phone                                       (Restaurant,Airline, etc)                                     └─────────────────┘
RAW_BUFFERClick to expand / collapse

Original Request

我经常需要请我的助理去打电话,比如:

  1. 预约餐厅
  2. 预约医院的医生
  3. 询问一些具体问题(例如打电话给航空公司要求更换或切换航班等)

我希望这个 Agent 可以打通电话,能够识别通话中的等待状态,并获取相关信息。它可以把我授权的信息提供给对方,同时也具备灵活变通的能力。

在遇到需要主人明确指示的情况时,它会主动联系我说:"Hey, I'm on the call. 我现在正在跟谁通话,需要你及时的回应。"

如果需要等待主人的回复,它能让对方先等一下。如果等待超过两三分钟我还没有回应,它会跟对方说:"我主人还没来得及回应我,我稍后再给你回过去。"

另外它还要具备一种能力,就是说打给别人时,如果对方说"等一下我再回给你电话",它可以保持这个 open call,在那等电话,然后接起来接续刚才的对话。

而且我需要你的声音听起来像真人。

English Translation:

I frequently need my assistant to make phone calls, such as:

  1. Making restaurant reservations
  2. Booking doctor appointments
  3. Asking specific questions (e.g., calling an airline to change/switch flights)

I want this agent to be able to place calls, detect hold/wait states during the call, and gather relevant information. It should be able to share information I've authorized with the other party, while also being flexible and adaptive.

When it encounters a situation requiring my explicit input, it should proactively contact me: "Hey, I'm on the call with [party]. I need your timely response."

If it needs to wait for my reply, it should ask the other party to hold. If I don't respond within 2-3 minutes, it should tell them: "My principal hasn't responded yet, I'll call you back shortly."

Additionally, if the other party says "hold on, I'll call you back," the agent should be able to keep the line open, wait for the callback, and seamlessly resume the conversation.

The voice must sound like a real human.

Agent's Two Cents (could be wrong)

Everything below is the AI agent's best guess based on the current codebase. Take with a grain of salt — the original request above is the only thing that came from a human.

Problem / Motivation

OpenClaw's existing voice-call plugin handles inbound calls and basic outbound conversation mode, but there is no capability for the agent to proactively place calls on the user's behalf to accomplish real-world tasks (reservations, inquiries, flight changes). This is one of the highest-value personal assistant features — it saves the user from hold queues, phone trees, and repetitive calls.

Proposed Solution

An Outbound Task Call mode for the voice-call plugin (or a new skill) that:

  1. Places an outbound call to a given phone number with a specific task/objective
  2. Navigates IVR menus (DTMF tone detection + generation, "press 1 for..." comprehension)
  3. Handles hold states — detects hold music/silence, waits patiently, resumes when a human picks up
  4. Escalates to the user in real-time via the primary chat channel (Slack/Telegram/etc.) when the agent needs authorization or decisions it can't make autonomously
  5. Manages call continuity — can hold the line waiting for callbacks, gracefully end and reschedule if the user is unresponsive (2-3 min timeout)
  6. Uses natural-sounding TTS — high-quality voice synthesis (ElevenLabs, OpenAI Realtime API, or similar) so the other party experiences a natural conversation
  7. Provides authorized information only — the agent shares only pre-approved details (name, phone, booking reference, etc.) and never over-shares

Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│                     OpenClaw Gateway                     │
│                                                          │
│  ┌──────────┐    ┌──────────────┐    ┌───────────────┐  │
│  │  Agent    │◄──►│  Task Call    │◄──►│  Telephony    │  │
│  │  (LLM)   │    │  Orchestrator │    │  Bridge       │  │
│  └────┬─────┘    └──────┬───────┘    └───────┬───────┘  │
│       │                 │                     │          │
│       │    ┌────────────┴────────────┐        │          │
│       │    │  Call State Machine     │        │          │
│       │    │  ┌─────┐ ┌──────┐      │        │          │
│       │    │  │ IVR │ │ Hold │      │        │          │
│       │    │  │ Nav │ │ Det. │      │        │          │
│       │    │  └─────┘ └──────┘      │        │          │
│       │    │  ┌──────────────┐      │        │          │
│       │    │  │  Escalation  │──────┼──►Chat │          │
│       │    │  │  Manager     │      │   Msg  │          │
│       │    │  └──────────────┘      │        │          │
│       │    └────────────────────────┘        │          │
│       │                                      │          │
│  ┌────┴──────────┐                 ┌─────────┴───────┐  │
│  │ Natural Voice │                 │ Twilio/Telnyx   │  │
│  │ TTS Engine    │                 │ SIP/PSTN        │  │
│  │ (ElevenLabs/  │                 │ Gateway         │  │
│  │  OpenAI RT)   │                 └─────────┬───────┘  │
│  └───────────────┘                           │          │
└──────────────────────────────────────────────┼──────────┘
                                     ┌─────────┴───────┐
                                     │  PSTN / Phone   │
                                     │  (Restaurant,   │
                                     │   Airline, etc) │
                                     └─────────────────┘

Dependencies & Potential Blockers

  • Telephony provider — Twilio or Telnyx account with outbound calling capability (already partially integrated via voice-call plugin)
  • High-quality TTS — Current TTS may not be natural enough for phone conversations; needs ElevenLabs or OpenAI Realtime API integration for human-like voice
  • Real-time STT — Low-latency speech-to-text for natural conversational flow (Deepgram, OpenAI Whisper streaming, or similar)
  • Existing voice-call plugin bugs — Multiple open issues with outbound calls (#48505, #42113, #56091) need to be resolved first
  • Hold detection — Non-trivial audio classification (music vs. silence vs. speech) to detect when hold ends
  • IVR/DTMF — Need to send DTMF tones and parse IVR prompts

External Setup Required

  • ⚠️ Telephony provider account: Twilio or Telnyx with outbound calling enabled + phone number
  • ⚠️ API key / credentials: ElevenLabs API key (or OpenAI Realtime API access) for natural-sounding TTS
  • ⚠️ API key / credentials: Deepgram or similar low-latency STT service API key
  • ⚠️ Payment / billing: Telephony costs (per-minute outbound PSTN calls), TTS API costs, STT API costs

How to Validate

  • Agent can place an outbound call to a real phone number
  • Agent navigates a simple IVR menu (e.g., "press 1 for reservations")
  • Agent detects hold music and resumes conversation when a human answers
  • Agent sends a real-time escalation message to the user's chat when it needs a decision
  • Agent gracefully hangs up and reports back if user doesn't respond within timeout
  • Agent can maintain an open line waiting for a callback
  • Voice sounds natural to the receiving party (subjective but testable)
  • Agent only shares pre-authorized information

Scope Estimate

large

Key Files/Modules Likely Involved

  • src/plugins/voice-call/ — existing voice-call plugin (outbound mode)
  • src/plugins/voice-call/providers/ — Twilio/Telnyx provider adapters
  • src/core/sessions/ — session management for call state + chat escalation
  • src/core/tools/tts/ — TTS engine integration
  • src/plugins/voice-call/call-state-machine.ts — (new) call state management

Rough Implementation Sketch

  • Extend voice-call plugin with a new outbound-task mode distinct from current outbound-conversation
  • Add a Call State Machine with states: dialing → ivr-navigation → hold-waiting → active-conversation → escalation-pending → wrap-up → ended
  • Integrate a high-quality TTS provider (ElevenLabs / OpenAI Realtime) specifically for outbound task calls
  • Add hold detection via audio classification (silence/music patterns)
  • Add DTMF generation for IVR navigation
  • Build an Escalation Manager that messages the user's primary chat channel when the agent needs input, with timeout logic (2-3 min → graceful hangup + reschedule)
  • Add a Task Context system where the user pre-authorizes what info can be shared (name, phone, booking ref, etc.)
  • Support callback waiting — keep the call session alive in a hold-waiting state

Open Questions

  • Should the agent identify itself as an AI assistant, or present as calling on behalf of the user? (Legal/ethical implications vary by jurisdiction — some US states require disclosure)
  • How to handle cases where the business refuses to speak with an AI?
  • Should there be a maximum call duration / cost cap?
  • How to handle conference-call scenarios (agent + user + business)?
  • Integration with existing voice-call plugin vs. building as a separate plugin?

Potential Risks or Gotchas

  • Legal compliance — Some jurisdictions require two-party consent for recordings, and some require AI disclosure
  • Cost control — Long hold times on airline calls can rack up telephony charges; need per-call and per-month budget limits
  • Reliability — Existing outbound call bugs (#48505, #42113) suggest the telephony bridge needs stabilization first
  • IVR complexity — Real-world phone trees can be deeply nested and unpredictable
  • Voice quality — Over PSTN (not VoIP), audio quality degrades; TTS must handle 8kHz narrow-band gracefully

Related Issues

  • #45280 — Feature request: per-call session scope for voice-call plugin
  • #56604 — Per-number voice and persona routing for voice-call plugin
  • #29164 — voice-call: add postCall hook to notify originating session with transcript summary
  • #48505 — [Bug] voice-call plugin outbound conversation mode sends empty TwiML responses
  • #42113 — voice-call: media stream killed by race condition on outbound conversation calls
  • #56091 — Telnyx outbound conversation calls: events skipped as replays, no STT
  • #12911 — Feature: Live Voice Mode in /chat UI

Suggested priority: p2

extent analysis

TL;DR

To implement an outbound task call feature for the voice-call plugin, extend the existing plugin with a new mode, integrate a high-quality TTS provider, and add hold detection, DTMF generation, and escalation management.

Guidance

  • Extend the voice-call plugin to support a new outbound-task mode, distinct from the current outbound-conversation mode.
  • Integrate a high-quality TTS provider, such as ElevenLabs or OpenAI Realtime, to ensure natural-sounding voice synthesis.
  • Implement hold detection using audio classification to identify silence and music patterns.
  • Develop an Escalation Manager to message the user's primary chat channel when the agent needs input, with a timeout logic to handle unresponsive users.

Example

// Example of a basic call state machine
enum CallState {
  Dialing,
  IvrNavigation,
  HoldWaiting,
  ActiveConversation,
  EscalationPending,
  WrapUp,
  Ended
}

class CallStateMachine {
  private state: CallState;

  constructor() {
    this.state = CallState.Dialing;
  }

  transitionTo(state: CallState) {
    this.state = state;
  }

  // ...
}

Notes

The implementation of the outbound task call feature will require significant development and testing to ensure reliability and compliance with legal requirements. The existing voice-call plugin bugs need to be resolved before implementing the new feature.

Recommendation

Apply a workaround by using a separate plugin or service for outbound task calls until the existing voice-call plugin is stabilized and the new feature is fully implemented. This will allow for a more controlled and reliable rollout of the new functionality.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING