llamaIndex - 💡(How to fix) Fix EvaporateExtractor.run_fn_on_nodes executes LLM output with full globals [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20678Fetched 2026-04-08 00:31:32
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
closed ×1cross-referenced ×1mentioned ×1subscribed ×1
RAW_BUFFERClick to expand / collapse

EvaporateExtractor.run_fn_on_nodes runs LLM-generated Python functions via exec(fn_str, globals()), giving the generated code access to the full module scope -- including os, signal, subprocess (if imported), and everything else available at module level.

https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/program/llama-index-program-evaporate/llama_index/program/evaporate/extractor.py#L233

The docstring already notes "There are definitely security holes with this approach", so this is a known area. The generated extraction functions really only need re and basic string builtins, so it's possible to sandbox the exec without breaking the existing functionality:

  • Restricted builtins dict (no eval/exec/compile/open/__import__ etc.)
  • AST validation to block dunder access and imports outside a small allowlist
  • Per-node sandbox dict instead of globals(), which also cleans up the global result/global node_text pattern

I put up a PR with this approach plus 16 unit tests for the sandbox: #20676

cc @logan-markewich

extent analysis

Fix Plan

To address the security issue, we will implement a sandboxed environment for executing LLM-generated Python functions.

Step-by-Step Solution:

  1. Create a restricted builtins dictionary:

restricted_builtins = { 'str': str, 'int': int, 'float': float, 'len': len, 'range': range, 're': import('re') }

2. **Implement AST validation**:
   Use the `ast` module to validate the generated code and block dunder access and imports outside a small allowlist.
   ```python
import ast

def validate_ast(code):
    tree = ast.parse(code)
    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            # Check if import is in allowlist
            if node.names[0].name not in ['re']:
                raise ValueError("Invalid import")
        elif isinstance(node, ast.Attribute) and node.attr.startswith('__'):
            # Block dunder access
            raise ValueError("Dunder access is not allowed")
  1. Use a per-node sandbox dictionary: Instead of using globals(), create a sandbox dictionary for each node to execute the generated code.

def run_fn_on_nodes(fn_str, node): sandbox = {'node': node, 'result': None, 're': import('re')} try: exec(fn_str, restricted_builtins, sandbox) return sandbox['result'] except Exception as e: # Handle exception pass


### Verification
To verify the fix, run the 16 unit tests provided in the PR #20676 to ensure the sandboxed environment is working correctly and the generated extraction functions are executing as expected.

### Extra Tips
* Regularly review and update the allowlist of imports to prevent potential security vulnerabilities.
* Consider using a more robust sandboxing solution, such as a separate process or container, for executing untrusted code.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - 💡(How to fix) Fix EvaporateExtractor.run_fn_on_nodes executes LLM output with full globals [1 participants]