llamaIndex - 💡(How to fix) Fix [Feature Request]: Trust scoring and interaction history for tool and agent reliability [1 participants]

llamaIndex2026-04-05 21:39:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#21312•Fetched 2026-04-08 02:52:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

viftode4

Participants

viftode4

Code Example

from trustchain import TrustClient

tc = TrustClient()

# before calling a tool: check its track record
trust = tc.check_trust(tool_pubkey)
if trust.tier < "ESTABLISHED" or trust.skill_reliability("retrieval") < 0.7:
    # use a different retriever or add verification
    ...

# after the call: record what happened
tc.record_interaction(counterparty=tool_pubkey, action="retrieval", quality=0.9)

RAW_BUFFERClick to expand / collapse

Feature Description

LlamaIndex agents can call tools, query external data sources, and delegate to sub-agents. But there's no way to track whether a particular tool or agent has been reliable across sessions. If your RAG pipeline queries an external API that returned bad data last time, or delegates to a sub-agent that failed 3 out of 5 recent tasks, nothing in LlamaIndex remembers that.

I built a sidecar that records interactions between agents and tools bilaterally. Both the caller and the callee cosign a record of what happened. From the graph of those records you get trust scores, skill-specific reliability (this tool is good at code search but bad at summarization), Sybil detection, and full audit trails. New tools/agents start UNPROVEN and graduate through 5 trust tiers as their track record grows.

Works as a transparent HTTP proxy (zero code changes) or through a Python SDK:

from trustchain import TrustClient

tc = TrustClient()

# before calling a tool: check its track record
trust = tc.check_trust(tool_pubkey)
if trust.tier < "ESTABLISHED" or trust.skill_reliability("retrieval") < 0.7:
    # use a different retriever or add verification
    ...

# after the call: record what happened
tc.record_interaction(counterparty=tool_pubkey, action="retrieval", quality=0.9)

Also has 7 MCP tools so MCP-capable agents can query trust natively. Verification works fully offline with just Ed25519 public keys.

The interaction graph is a substrate you can build on. Trust scoring is just the first use. The same graph powers agent discovery (rank by reputation within a capability), behavioral anomaly detection, dispute resolution, and compliance audit trails.

Reason

LlamaIndex currently has no mechanism for cross-session tool/agent reliability tracking. If a tool returns bad results, the agent has no memory of that in the next session. Guardrails validate output format but not source reliability over time.

Value of Feature

Agents make better delegation decisions based on actual track record, not just tool descriptions
Bad tools/agents get deprioritized automatically without manual curation
RAG pipelines become more reliable by routing to sources with proven retrieval quality
Full audit trail for compliance (EU AI Act Article 12 ready)

Working implementation with a LlamaIndex adapter: https://github.com/viftode4/trustchain Live demo with 21 LLM agents in a trust-scored economy: http://5.161.255.238:8888

extent analysis

TL;DR

Implementing a trust tracking system, such as the proposed sidecar, can help LlamaIndex agents make informed decisions about tool and agent reliability across sessions.

Guidance

Integrate the trust tracking sidecar into the LlamaIndex pipeline to record interactions between agents and tools, allowing for the calculation of trust scores and skill-specific reliability.
Use the provided Python SDK to check the track record of tools before calling them and record the outcome of interactions to update the trust scores.
Consider using the MCP tools to enable native trust queries for MCP-capable agents.
Evaluate the effectiveness of the trust tracking system in improving the reliability of RAG pipelines and agent delegation decisions.

Example

from trustchain import TrustClient

tc = TrustClient()
trust = tc.check_trust(tool_pubkey)
if trust.tier < "ESTABLISHED" or trust.skill_reliability("retrieval") < 0.7:
    # use a different retriever or add verification
    pass

Notes

The proposed solution requires the integration of the trust tracking sidecar into the LlamaIndex pipeline, which may involve additional development and testing efforts.

Recommendation

Apply the proposed trust tracking sidecar workaround to improve the reliability of LlamaIndex agents and RAG pipelines, as it provides a transparent and effective way to track tool and agent reliability across sessions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

llamaIndex - 💡(How to fix) Fix [Feature Request]: Trust scoring and interaction history for tool and agent reliability [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Feature Description

Reason

Value of Feature

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

llamaIndex - 💡(How to fix) Fix [Feature Request]: Trust scoring and interaction history for tool and agent reliability [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Feature Description

Reason

Value of Feature

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING