codex - 💡(How to fix) Fix What Codex Is Missing: It’s the Harness, Not the Model [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#18940Fetched 2026-04-23 07:33:00
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
0
Timeline (top)
labeled ×3commented ×2closed ×1subscribed ×1

In practice, I use Codex alongside other coding agents (e.g., Claude Code). From a pure model capability perspective, they feel very close. On difficult tasks, when one struggles, switching to the other usually resolves the problem.

Head-to-head, both are strong. The difference in day-to-day usage comes from the harness, not the model.

Root Cause

From a user perspective, these two aspects (planning + parallelism) significantly affect:

  • Trust in completion
  • Ability to stay in flow
  • Overall productivity

In my opinion, Codex is already very close to being best-in-class. Improving the harness in these areas could meaningfully close the remaining gap.

RAW_BUFFERClick to expand / collapse

What variant of Codex are you using?

App, IDE Extension

What feature would you like to see?

Hi Codex team,

First of all, great work — Codex is already extremely strong at the model level. In my experience using multiple systems side by side, the gap today is not the model capability, but the harness around it.

I wanted to share some structured feedback that might be useful for future improvements.

Context

In practice, I use Codex alongside other coding agents (e.g., Claude Code). From a pure model capability perspective, they feel very close. On difficult tasks, when one struggles, switching to the other usually resolves the problem.

Head-to-head, both are strong. The difference in day-to-day usage comes from the harness, not the model.

1. Planning and task visibility

One of the biggest usability improvements would be:

  • Breaking complex tasks into explicit subtasks
  • Maintaining a visible, persistent task list during execution
  • Ensuring all items are completed before signaling "done"

This has a big impact on trust. When a system exposes its plan and progress clearly, it's much easier to verify completeness without repeatedly prompting for missed steps.

Concrete example from my workflow:

With other systems, when they say “done,” I can actually verify that every subtask was completed because the list is visible and tracked. This removes the need to double-check or ask follow-up questions like “what about that other item?”

This significantly improves confidence and reduces interaction overhead.

2. Parallel task execution / interruption handling

Another key area is how the system handles interruptions:

  • If a task is running and the user asks something new, ideally:
    • The current task continues when possible
    • The new task starts in parallel
  • If interruption is required, the system should be able to reliably resume previous work from where it left off

Right now, the workflow tends to be more serial:

  • Interrupting a task often cancels it
  • Resuming requires manual re-instruction

Concrete example from my workflow:

When I interrupt Codex to ask something else, the current task is usually stopped. To continue, I need to manually guide it back to the previous state.

In contrast, in systems that support parallel execution, I can:

  • Start a new request
  • Keep previous tasks running
  • Manage multiple threads of work simultaneously

This allows a much more fluid workflow, where I can “open new fronts” without losing progress on existing ones.

With Codex, I tend to adapt my behavior:

  • Smaller, more focused prompts
  • One task at a time
  • Waiting for completion before moving on

It works well, but it’s a more serial and less fluid interaction model.

Why this matters

From a user perspective, these two aspects (planning + parallelism) significantly affect:

  • Trust in completion
  • Ability to stay in flow
  • Overall productivity

In my opinion, Codex is already very close to being best-in-class. Improving the harness in these areas could meaningfully close the remaining gap.

Final note

I understand from the contribution guidelines that core changes are internal, so I’m sharing this as product feedback rather than a code proposal.

Happy to expand on any of these points if useful.

Thanks again for the great work.

Additional information

No response

extent analysis

TL;DR

The user suggests improving Codex's usability by enhancing task planning, visibility, and parallel task execution to increase trust, productivity, and overall user experience.

Guidance

  • Consider implementing a visible and persistent task list to allow users to track progress and verify completeness of complex tasks.
  • Develop a system to handle interruptions by continuing current tasks in the background and starting new tasks in parallel, with the ability to reliably resume previous work.
  • Improve the interaction model to support a more fluid workflow, enabling users to manage multiple tasks simultaneously without losing progress.
  • Enhance the system's ability to break down complex tasks into explicit subtasks, ensuring all items are completed before signaling task completion.

Example

No specific code snippet can be provided without more technical details, but a high-level example of how task planning and parallel execution could be implemented might involve:

# Pseudocode example of task management
class TaskManager:
    def __init__(self):
        self.tasks = []

    def add_task(self, task):
        self.tasks.append(task)

    def execute_tasks(self):
        for task in self.tasks:
            # Execute task in parallel or sequentially
            pass

    def interrupt_task(self, new_task):
        # Handle interruption and resume previous task
        pass

Notes

The provided feedback focuses on usability and workflow improvements, which are crucial for user experience but may require significant changes to the existing system architecture.

Recommendation

Apply workaround: While the suggested improvements are significant, they may require substantial development efforts. In the meantime, users can adapt their workflow to use smaller, more focused prompts and manage tasks sequentially to achieve their goals.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING