1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
---
title: "Agent Timeout Does Not Surface Error to UI - UI Hangs Indefinitely"
date: 2026-04-09
description: "When an LLM request times out during agent execution, the Web UI hangs indefinitely showing a loading spinner instead of displaying a timeout error message."
tags: ["timeout", "agent", "web-ui", "websocket", "bug:behavior"]
sources:
  - platform: "GitHub Issue"
    id: "openclaw#64793"
    url: "https://github.com/openclaw/openclaw/issues/64793"
openclaw_version: "2026.4.9"
---

## Symptom

When an LLM request exceeds the agent timeout threshold, the following behavior is observed:

- **Agent logs** correctly log the timeout event:

decision=surface_error reason=timeout


- **Gateway logs** show a `ConnectionAbortedError` indicating the connection was terminated:

ConnectionAbortedError: [WinError 10053] Your host software aborted an established connection. RemoteProtocolError: Server disconnected without sending a response.


- **Web UI** displays an indefinite loading spinner, never transitioning to an error state or recovering

- **User impact**: No error message is displayed, and the user cannot retry without refreshing the page (which loses conversation context)

## Root Cause Analysis

After analyzing the logs and behavior, the root cause is identified as follows:

1. **Timeout detection works correctly**: The agent's timeout mechanism properly detects when an LLM response exceeds the threshold and logs `decision=surface_error reason=timeout`.

2. **Premature connection termination**: When the agent aborts a run due to timeout, the WebSocket connection is terminated before the agent can send a `final` event with `status: "timeout"` to the Web UI.

3. **Missing error propagation**: The `agent.wait` method is expected to return `status: "timeout"`, but this status is never transmitted to the UI client because the connection is already closed.

4. **Race condition**: The connection teardown happens faster than the error event can be dispatched, causing the UI to remain in a perpetual loading state.

5. **Gateway limitation**: The custom gateway (`ai_router.py`) handles retry logic correctly for network errors, but cannot compensate for the agent aborting the connection from its side.

## Solution

To resolve this issue, the following changes are required:

1. **Ensure `final` event delivery before connection teardown**: Modify the agent's timeout handling to guarantee that a `final` event with `status: "timeout"` is sent to the Web UI **before** the WebSocket connection is terminated.

2. **Implement graceful timeout error response**: Update the agent's timeout logic to construct and send a proper error event:
 ```python
 # Example implementation guidance
 await websocket.send_json({
     "type": "final",
     "status": "timeout",
     "error": {
         "code": "AGENT_TIMEOUT",
         "message": "Agent execution timed out waiting for LLM response"
     }
 })
  1. Add timeout event to WebSocket protocol: Ensure the WebSocket handler in the gateway recognizes and properly propagates timeout events to connected clients.

  2. UI timeout handling: Verify that the Web UI correctly handles the final event with status: "timeout" and displays an appropriate error message with a retry option.

Prevention

To prevent similar issues in the future:

  1. Establish event delivery guarantees: Implement a protocol where critical events (especially final events with any status) must be delivered before connection termination, using proper acknowledgment or flushing mechanisms.

  2. Add integration tests for timeout scenarios: Create automated tests that verify timeout errors are correctly propagated to the UI across all deployment methods.

  3. Implement connection graceful shutdown: Ensure WebSocket connections undergo a graceful shutdown sequence that flushes pending events before closing.

  4. Add monitoring for incomplete sessions: Implement metrics/alerting for sessions that remain in loading state beyond expected durations.

  5. Document WebSocket event protocol: Maintain clear documentation of all event types and statuses that the UI should handle, including timeout scenarios.

Additional Information

Affected deployment methods: All deployment methods (Docker, bare metal, etc.) on any OS where LLM response times may exceed the agent timeout threshold.

Workaround: Refresh the page to reset the UI state, though this results in loss of conversation context.

Related components:

  • Agent timeout handler
  • WebSocket gateway service
  • Web UI event listener

Suggested debugging steps:

  1. Enable WebSocket frame logging to capture all events sent to the UI
  2. Add timing instrumentation around the timeout detection and connection termination
  3. Trace the event delivery path from agent.wait to WebSocket send

Priority: High - This bug blocks user workflows and makes the UI unusable until page refresh.