# Annotate the Bimanual Robot Trajectory with Reasoning

## Role & Objective

You are an expert reinforcement learning researcher. Your task is to annotate a **bimanual robotic arm** manipulation trajectory with detailed reasoning. You must synthesize low-level primitive movement logs and multi-view video visual context into a **high-level, semantic reasoning annotation**.

## Specification of the Experimental Setup

**Task Instruction:** "{task_description}"
**Video Information:**
- Total Frames: {total_frames}
- Frame Rate: {fps} FPS
- Duration: {duration:.2f} seconds

### Multi-View Video Layout (2x2 Grid)

* **Top-Left**: Head camera (Overhead view)
* **Top-Right**: Front camera
* **Bottom-Left**: Left camera
* **Bottom-Right**: Right camera

### Critical Visual Identification Rules

* **Left Gripper**: Enters from the **LEFT** side of the frame
* **Right Gripper**: Enters from the **RIGHT** side of the frame
* **Identity is constant**: Never swap labels even if arms cross

---

## Primitive Movement Annotations

The following logs show the low-level primitives extracted from the trajectory (one per frame):

```
{primitive_movements}
```

**Primitive Legend:**
- `move [direction]`: Translation (forward/backward, left/right, up/down)
- `tilt up/down`: Pitch rotation
- `rotate clockwise/counterclockwise`: Yaw rotation
- `open/close gripper`: Gripper state change
- `stop`: Stationary

---

## ⭐️ Annotation Rules (Must Follow) ⭐️

### 1. Move Aggregation Rule (Anti-Fragmentation)

**DO NOT** annotate every single primitive line. You must **MERGE** consecutive primitives into a single semantic action if they share the same **Goal** and **Spatial Trend**.

**Start a new move ONLY when:**
1. The **Target** changes (e.g., from Cup to Plate)
2. The **Action Type** changes (e.g., from Approach to Grasp)
3. The **Major Direction** reverses (e.g., from lifting UP to pushing DOWN)

### 2. Bimanual Coordination (⭐️ CRITICAL ⭐️)

**Distinguish SIMULTANEOUS vs SEQUENTIAL by checking `stop` patterns:**

| Left Gripper | Right Gripper | Type | How to Annotate |
|--------------|---------------|------|-----------------|
| Moving | Moving | **SIMULTANEOUS** | Use `"arm": "both"` |
| Moving | `stop` (consecutive) | **SEQUENTIAL** | Use `"arm": "left"` only |
| `stop` (consecutive) | Moving | **SEQUENTIAL** | Use `"arm": "right"` only |

**⚠️ KEY RULE**: If one gripper shows **consecutive `stop` entries**, they are operating **SEQUENTIALLY** — do NOT use "both"!

### 3. Semantic Hierarchy

- **task**: The overall remaining goal (updates as progress is made)
- **plan**: 3-5 remaining high-level steps to complete the task
- **subtask**: The **specific position/state goal** to achieve now (NOT an action)
- **subtask_reason**: Why this subtask is necessary now (spatial/physics reasoning)
- **move**: Simple primitive-level description (e.g., "move forward down")
- **move_reason**: Why this specific movement is needed (spatial relationship)

---

## Output Format

Output a **Python-executable dictionary** where keys are start times (string format):

```python
{{
    "<timestamp>": {{
        "task": "<Remaining overall task description>",
        "plan": ["<High-level step 1>", "<High-level step 2>", ...],
        "subtask": "<Current goal: specific position/state to achieve>",
        "subtask_reason": "<Why this subtask now, environmental features, physics logic>",
        "moves": [
            {{
                "time": <float>,
                "arm": "left/right/both",
                "move": "<Primitive action description>",
                "move_reason": "<Spatial/positional reason for this action>"
            }}
        ]
    }},
    ...
}}
```

### Field Guidelines

| Field | Requirement |
|-------|-------------|
| `task` | Describe remaining overall goal, update as progress |
| `plan` | 3-5 remaining high-level steps |
| `subtask` | Specific **position or state** to achieve (not an action) |
| `subtask_reason` | Environmental features + physics logic |
| `arm` | "left", "right", or "both" (only when truly simultaneous) |
| `move` | Concise primitive description (<10 words) |
| `move_reason` | Specific spatial/positional reasoning |

---

## Example

**Task**: "Place the bread into the basket"

```python
{{
    "0.0": {{
        "task": "Place the bread into the basket",
        "plan": ["Approach the bread", "Grasp the bread", "Move above the basket", "Lower and release"],
        "subtask": "Position left gripper above the bread",
        "subtask_reason": "The bread is located at the front-left of the workspace. Need to position directly above before grasping.",
        "moves": [
            {{
                "time": 0.0,
                "arm": "left",
                "move": "move forward down",
                "move_reason": "Bread is in front and below current position, moving forward and down to approach"
            }}
        ]
    }},
    "1.5": {{
        "task": "Grasp the bread and place into basket",
        "plan": ["Grasp the bread", "Move above the basket", "Lower and release"],
        "subtask": "Close gripper to grasp the bread",
        "subtask_reason": "Gripper is now positioned directly above bread at appropriate height for grasping",
        "moves": [
            {{
                "time": 1.5,
                "arm": "left",
                "move": "close gripper",
                "move_reason": "Bread is within gripper range, closing to secure grasp"
            }}
        ]
    }}
}}
```

---

## Final Constraints

1. **Identity**: Stick to visual Left/Right defined at t=0
2. **Sequential Check**: ALWAYS check for consecutive `stop` patterns to detect sequential operations
3. **Subtask Focus**: subtask describes WHERE to go or WHAT state to achieve, not the action itself
4. **Move Simplicity**: move should be simple primitive descriptions like the input primitives
5. **JSON Format**: Strictly valid JSON
6. **End Marker**: Write **FINISHED** at the very end

**Now, strictly based on the video and primitive logs provided, annotate the trajectory.**
