llamaIndex - ✅(Solved) Fix PandasQueryEngine safe_exec allows filesystem/network I/O and has no timeout [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#20690Fetched 2026-04-08 00:31:29
View on GitHub
Comments
2
Participants
2
Timeline
16
Reactions
0
Timeline (top)
mentioned ×5subscribed ×5commented ×2referenced ×2

Fix Action

Fixed

PR fix notes

PR #20691: Block filesystem/network I/O and add timeout to safe_exec/safe_eval

Description (problem / solution / changelog)

Closes #20690

Builds on the original RCE fix from #7054. The existing sandbox blocks direct imports and dunder/private attribute access, but LLM-generated code can still reach the filesystem and network through pandas, numpy, and polars methods that are already in the execution locals. There is also no timeout, so an infinite loop hangs the process forever.

What changes are proposed

I/O operation blocking (AST-level): Added _DANGEROUS_ATTR_CALLS -- a frozenset of ~50 method names covering pandas read/write (read_csv, to_csv, read_sql, to_parquet, ...), numpy file I/O (load, save, loadtxt, fromfile, ...), polars lazy/eager I/O (scan_csv, write_parquet, ...), and general dangerous calls (system, popen). The existing DunderVisitor now flags these during AST analysis, so they are rejected before any code runs.

Timeout enforcement: Added _time_limit() context manager using signal.SIGALRM (no-op on Windows where SIGALRM is unavailable). Both safe_exec and safe_eval now accept a timeout_seconds parameter (default 30s). Same pattern used in the evaporate sandbox.

Tests expanded from 18 to ~200 lines: 36 parametrized tests covering pandas I/O blocking (8 cases), numpy I/O blocking (7 cases), polars I/O blocking (4 cases), system call blocking, safe DataFrame operations still passing (8 cases), basic safe_exec/safe_eval smoke tests, and timeout enforcement for both exec and eval.

Normal DataFrame operations (groupby, merge, sort_values, head, describe, rename, dropna, fillna) are unaffected.

How is this PR tested?

$ python3 -m pytest llama-index-experimental/tests/test_exec_utils.py -v
36 passed in 4.60s

$ python3 -m pytest llama-index-experimental/tests/test_pandas.py -v
3 passed, 1 skipped (e2e test requires OPENAI_API_KEY)

$ python3 -m ruff check llama-index-experimental/
All checks passed!

Changed files

  • llama-index-experimental/llama_index/experimental/exec_utils.py (modified, +109/-10)
  • llama-index-experimental/pyproject.toml (modified, +1/-1)
  • llama-index-experimental/tests/test_exec_utils.py (modified, +181/-1)

Code Example

from llama_index.experimental.exec_utils import safe_exec

# This should be blocked but currently succeeds:
safe_exec('result = pd.read_csv("/etc/passwd")', __locals={"pd": __import__("pandas")})

---

from llama_index.experimental.exec_utils import safe_exec

# This hangs forever:
safe_exec("while True: pass")
RAW_BUFFERClick to expand / collapse

The safe_exec/safe_eval functions in llama-index-experimental (exec_utils.py) block direct imports and dunder access, but LLM-generated code can still:

  1. Read/write the filesystem through pandas, numpy, or polars methods that are already in scope (e.g. pd.read_csv("/etc/passwd"), df.to_csv("/tmp/exfil.csv"), np.load("evil.npy"))

  2. Hang the process indefinitely with while True: pass since there is no execution timeout

These are gaps left over from the original RCE fix in #7054. The dunder/import restrictions work well, but the allowed libraries themselves expose I/O that bypasses the sandbox.

Reproduction (I/O bypass):

from llama_index.experimental.exec_utils import safe_exec

# This should be blocked but currently succeeds:
safe_exec('result = pd.read_csv("/etc/passwd")', __locals={"pd": __import__("pandas")})

Reproduction (no timeout):

from llama_index.experimental.exec_utils import safe_exec

# This hangs forever:
safe_exec("while True: pass")

Proposed fix:

  • Add an AST-level blocklist for dangerous I/O method calls (read_csv, to_csv, np.load, np.save, scan_csv, etc.)
  • Add a timeout_seconds parameter using signal.SIGALRM (same pattern as the evaporate sandbox)

Will send a PR shortly.

extent analysis

Fix Plan

To address the security gaps in the safe_exec/safe_eval functions, we will implement the following steps:

  • Add an AST-level blocklist for dangerous I/O method calls
  • Introduce a timeout_seconds parameter using signal.SIGALRM

Code Changes

Here's an example of how you can implement these changes:

import ast
import signal
import sys

# Define a list of blocked I/O methods
blocked_methods = ["read_csv", "to_csv", "np.load", "np.save", "scan_csv"]

def check_ast(node):
    """Check if the AST node contains any blocked I/O method calls"""
    if isinstance(node, ast.Call):
        if isinstance(node.func, ast.Attribute):
            if node.func.attr in blocked_methods:
                raise ValueError(f"Blocked I/O method: {node.func.attr}")
        elif isinstance(node.func, ast.Name):
            if node.func.id in blocked_methods:
                raise ValueError(f"Blocked I/O method: {node.func.id}")

def safe_exec(code, locals_dict, timeout_seconds=10):
    """Execute the code with a timeout and I/O method blocklist"""
    try:
        # Parse the code into an AST
        tree = ast.parse(code)

        # Check the AST for blocked I/O method calls
        for node in ast.walk(tree):
            check_ast(node)

        # Set the timeout
        def timeout_handler(signum, frame):
            raise TimeoutError("Execution timed out")

        signal.signal(signal.SIGALRM, timeout_handler)
        signal.alarm(timeout_seconds)

        # Execute the code
        exec(code, locals_dict)

        # Disable the timeout
        signal.alarm(0)
    except TimeoutError as e:
        print(f"Error: {e}")
    except Exception as e:
        print(f"Error: {e}")

# Example usage:
try:
    safe_exec('result = pd.read_csv("/etc/passwd")', __locals={"pd": __import__("pandas")})
except ValueError as e:
    print(f"Error: {e}")

try:
    safe_exec("while True: pass", timeout_seconds=5)
except TimeoutError as e:
    print(f"Error: {e}")

Verification

To verify that the fix worked, you can test the safe_exec function with the provided reproduction examples. The function should now raise a ValueError for blocked I/O method calls and a TimeoutError for executions that exceed the specified timeout.

Extra Tips

  • Make sure to thoroughly test the safe_exec function with various inputs and edge cases to ensure its security and reliability.
  • Consider using a more robust sandboxing solution, such as a separate process or container, to further isolate the execution environment.
  • Keep the blocked_methods list up-to-date with any new I/O methods that may be added to the allowed libraries.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - ✅(Solved) Fix PandasQueryEngine safe_exec allows filesystem/network I/O and has no timeout [1 pull requests, 2 comments, 2 participants]