llamaIndex - 💡(How to fix) Fix [Feature Request]: Trust scoring and interaction history for tool and agent reliability [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21312Fetched 2026-04-08 02:52:04
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Code Example

from trustchain import TrustClient

tc = TrustClient()

# before calling a tool: check its track record
trust = tc.check_trust(tool_pubkey)
if trust.tier < "ESTABLISHED" or trust.skill_reliability("retrieval") < 0.7:
    # use a different retriever or add verification
    ...

# after the call: record what happened
tc.record_interaction(counterparty=tool_pubkey, action="retrieval", quality=0.9)
RAW_BUFFERClick to expand / collapse

Feature Description

LlamaIndex agents can call tools, query external data sources, and delegate to sub-agents. But there's no way to track whether a particular tool or agent has been reliable across sessions. If your RAG pipeline queries an external API that returned bad data last time, or delegates to a sub-agent that failed 3 out of 5 recent tasks, nothing in LlamaIndex remembers that.

I built a sidecar that records interactions between agents and tools bilaterally. Both the caller and the callee cosign a record of what happened. From the graph of those records you get trust scores, skill-specific reliability (this tool is good at code search but bad at summarization), Sybil detection, and full audit trails. New tools/agents start UNPROVEN and graduate through 5 trust tiers as their track record grows.

Works as a transparent HTTP proxy (zero code changes) or through a Python SDK:

from trustchain import TrustClient

tc = TrustClient()

# before calling a tool: check its track record
trust = tc.check_trust(tool_pubkey)
if trust.tier < "ESTABLISHED" or trust.skill_reliability("retrieval") < 0.7:
    # use a different retriever or add verification
    ...

# after the call: record what happened
tc.record_interaction(counterparty=tool_pubkey, action="retrieval", quality=0.9)

Also has 7 MCP tools so MCP-capable agents can query trust natively. Verification works fully offline with just Ed25519 public keys.

The interaction graph is a substrate you can build on. Trust scoring is just the first use. The same graph powers agent discovery (rank by reputation within a capability), behavioral anomaly detection, dispute resolution, and compliance audit trails.

Reason

LlamaIndex currently has no mechanism for cross-session tool/agent reliability tracking. If a tool returns bad results, the agent has no memory of that in the next session. Guardrails validate output format but not source reliability over time.

Value of Feature

  • Agents make better delegation decisions based on actual track record, not just tool descriptions
  • Bad tools/agents get deprioritized automatically without manual curation
  • RAG pipelines become more reliable by routing to sources with proven retrieval quality
  • Full audit trail for compliance (EU AI Act Article 12 ready)

Working implementation with a LlamaIndex adapter: https://github.com/viftode4/trustchain Live demo with 21 LLM agents in a trust-scored economy: http://5.161.255.238:8888

extent analysis

TL;DR

Implementing a trust tracking system, such as the proposed sidecar, can help LlamaIndex agents make informed decisions about tool and agent reliability across sessions.

Guidance

  • Integrate the trust tracking sidecar into the LlamaIndex pipeline to record interactions between agents and tools, allowing for the calculation of trust scores and skill-specific reliability.
  • Use the provided Python SDK to check the track record of tools before calling them and record the outcome of interactions to update the trust scores.
  • Consider using the MCP tools to enable native trust queries for MCP-capable agents.
  • Evaluate the effectiveness of the trust tracking system in improving the reliability of RAG pipelines and agent delegation decisions.

Example

from trustchain import TrustClient

tc = TrustClient()
trust = tc.check_trust(tool_pubkey)
if trust.tier < "ESTABLISHED" or trust.skill_reliability("retrieval") < 0.7:
    # use a different retriever or add verification
    pass

Notes

The proposed solution requires the integration of the trust tracking sidecar into the LlamaIndex pipeline, which may involve additional development and testing efforts.

Recommendation

Apply the proposed trust tracking sidecar workaround to improve the reliability of LlamaIndex agents and RAG pipelines, as it provides a transparent and effective way to track tool and agent reliability across sessions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - 💡(How to fix) Fix [Feature Request]: Trust scoring and interaction history for tool and agent reliability [1 participants]