openclaw - 💡(How to fix) Fix [Feature]: agent orchestration with local agents for sub agents

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

I would like to offload work from cloud models to local models

Root Cause

I would like to offload work from cloud models to local models

RAW_BUFFERClick to expand / collapse

Summary

I would like to offload work from cloud models to local models

Problem to solve

I would like to offload work from cloud models to local models to improve speed and reduce cloud API costs, whenever the computer running openclaw has access to a GPU or NPU. It could also work on other devices such as FPGAs or CPUs. This would better utilize large amounts of idle compute in laptops and desktops worldwide and thus assist with carbon reduction initiatives

Proposed solution

Openclaw will not handle interactions with these hardware devices directly, instead it will install for the user llama.cpp or vLLM or whisper.cpp or comfy UI or other inference, LLM or deep learning or image generation or machine learning frameworks depending on the capabilities of the hardware.

Currently the user needs to know to do this to optimize for speed and cost. The user could be asked about this as it can increase local power consumption and heat generation which may be undesirable depending on the location of the machine running openclaw.

For example. For LLMs good rule of thumb is to use vLLM on machines with large amounts of VRAM over 32 GB, and llama.cpp for all other machines. Machine capabilities can be determined by comparing the size of the model in gb to available ram or VRAM. Then inference speed can be determined by looking at the ram transfer speed in GB/s which can be obtained online or through benchmarks or through local machine information, self reported. An approach similar to cpu-z and gpu-z can be used to infer GB/s from available memory clock speed and bus width

Alternatives considered

No response

Impact

Affected users: users with prompts that cause agent orchestration Severity: annoying. Laptops sit mostly idle and compute power is concentrated in large data centers Frequency: whenever agent orchestration is needed Consequence: agents could take less time to complete tasks. A goal is to do it as fast as a human can for searching places on Google maps, scanning place names with vision aka OCR, and then searching online. For example vision could be done locally with 6gb of VRAM and a 4b vision VLM such as qwen

Evidence/examples

No response

Additional information

The user needs to be highly technical to do this and I couldn't find anything that says openclae is capable of multi agent orchestration with some of them running local models

A big challenge is the limited usefulness of LLMs less than 4b parameters in size. They are better suited to natural language processing tasks or text recognition or transcription

Another challenge is determining the usefulness of local models depending on local hardware TOPS. For example a 7th gen i5 can run whisper.cpp since it runs in ram, but is slower than real time unless parameters to control context (I don't remember what else) are passed ton the command line and the same applies to diffusion models, which are compute-bound

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING