Agentic + LLM Systems

OODA Loop Architecture for Production AI Agents

John Boyd designed the OODA loop for fighter pilots making life-or-death decisions in milliseconds with incomplete information. It turns out this is a better mental model for production AI agents than the ReAct loop — especially in high-stakes, time-pressured environments where agents need to fail fast, course-correct, and maintain situational awareness across a multi-step decision horizon.

Share this article
Comments
Share:
John Boyd designed the OODA loop for fighter pilots making life-or-death decisions in milliseconds with incomplete information. It turns out this is a better mental model for production AI agents than the ReAct loop — especially in high-stakes, time-pressured environments where agents need to fail fast, course-correct, and maintain situational awareness across a multi-step decision horizon.
Table of Contents

The ReAct loop — Reasoning + Acting, interleaved — has become the default mental model for AI agents. The agent reasons about the current state, selects a tool, observes the result, reasons again, and continues until it reaches a terminal state or hits a step limit. It’s clean, it’s documented, and it’s the basis for most agent frameworks available today.

It’s also a poor fit for the class of agents that matter most in enterprise environments: agents operating under real-time pressure, partial information, adversarial conditions, and high-consequence outcomes where wrong decisions compound faster than the agent can correct them.

John Boyd didn’t design the OODA loop for clean, sequential decision problems. He designed it for fighter pilots making kinetic decisions in milliseconds with radar contacts that might be friendly, hostile, or sensor artifacts — in an environment where the penalty for a wrong decision is immediate and irreversible. The framework he built is not sequential reasoning to action. It is a continuous, parallel loop that emphasizes orientation — the mental model that governs how observations are interpreted — as the most important and most vulnerable part of the decision system.

That is exactly the architecture problem that production AI agents have, and that the ReAct framing systematically understates.

What the OODA Loop Actually Is (Not What the Slides Say)

Most OODA loop presentations show it as four boxes in a circle: Observe → Orient → Decide → Act → repeat. That is the tourist version. The actual Boyd model is more complex and more useful.

In Boyd’s original briefing, the OODA loop has two critical properties that the four-box diagram erases:

1. Implicit guidance and control. The loop doesn’t always complete all four stages explicitly. In experienced operators under time pressure, Observation can flow directly to Action — bypassing explicit Orient and Decide phases — because the operator’s trained orientation already contains the decision logic. This is not a failure mode; it is an optimization. The question for AI agents is: when should the agent execute implicit guidance (fast pattern-matching to action) vs. explicit reasoning (full OODA cycle)?

2. Orientation dominates everything. Boyd’s key insight was that Orientation — the synthesis of cultural traditions, prior experiences, mental models, and incoming data — is not just one phase of the loop. It is the lens through which every other phase operates. A pilot with a wrong orientation will observe the same radar contact as every other pilot and draw a different, worse conclusion. The loop cycles fast, but if Orientation is miscalibrated, it cycles fast toward the wrong decision.

For AI agents, this maps directly: the system prompt, the retrieved context, the agent’s “beliefs” about the current state — these are the Orientation layer. An agent with a miscalibrated Orientation (outdated context, wrong prior, hallucinated state) will reason correctly within its frame and arrive at the wrong answer every time.

The Production Architecture

Here is the OODA-based agent architecture I’ve implemented in production environments, including the SmartCIO platform. The pattern is most applicable to agents operating in enterprise workflows with real-time data, multiple tool integrations, and human-in-the-loop checkpoints.

┌─────────────────────────────────────────────────────────┐
│                    OODA Agent Runtime                    │
│                                                          │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐           │
│  │ OBSERVE  │───▶│  ORIENT  │───▶│  DECIDE  │           │
│  │          │    │          │    │          │           │
│  │ Sensors  │    │ Context  │    │ Planner  │           │
│  │ Tools    │    │ Fusion   │    │ Ranker   │           │
│  │ Streams  │    │ World    │    │ Guard    │           │
│  │          │    │ Model    │    │          │           │
│  └──────────┘    └──────────┘    └────┬─────┘           │
│       ▲                               │                 │
│       │          ┌──────────┐         │                 │
│       └──────────│   ACT    │◀────────┘                 │
│                  │          │                           │
│                  │ Executor │                           │
│                  │ Verifier │                           │
│                  │ Auditor  │                           │
│                  └──────────┘                           │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │             Orientation Layer (Persistent)       │   │
│  │  World Model · Prior Context · Tool State       │   │
│  │  Constraints · Risk Thresholds · Memory         │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

The key architectural decision that distinguishes this from a ReAct loop is the Orientation Layer as a first-class persistent component — not just the current chain-of-thought, but an explicit, updateable world model that persists across tool calls and loop iterations.

The Observe Phase: Multi-Signal Sensing

In ReAct, observation is typically the return value of the last tool call. In the OODA architecture, observation is a multi-signal aggregation step that runs in parallel.

class ObservePhase:
    def __init__(self, sensors: list[Sensor]):
        self.sensors = sensors
    
    async def observe(self, context: AgentContext) -> Observation:
        # All sensors run in parallel — don't serialize observation
        raw_signals = await asyncio.gather(
            *[sensor.read(context) for sensor in self.sensors],
            return_exceptions=True
        )
        
        # Separate valid signals from failed sensors
        valid_signals = [s for s in raw_signals if not isinstance(s, Exception)]
        failed_sensors = [s for s in raw_signals if isinstance(s, Exception)]
        
        return Observation(
            signals=valid_signals,
            failed_sensors=failed_sensors,
            timestamp=datetime.utcnow(),
            confidence=self._compute_confidence(valid_signals, failed_sensors)
        )
    
    def _compute_confidence(self, valid, failed) -> float:
        # Low confidence when key sensors are offline
        critical_failures = sum(1 for s in failed if getattr(s, 'critical', False))
        return 1.0 - (critical_failures * 0.3)

The confidence field on the Observation is not decoration. It gates the downstream Orient and Decide phases. An observation with confidence below a threshold triggers a different decision path — slow down, escalate, or wait for more signal — rather than proceeding with a low-quality world-model update.

This is the design principle that most ReAct implementations miss: the agent should explicitly model the quality of its own observations and adjust its decision-making accordingly, rather than treating every tool response as equally reliable.

The Orient Phase: World Model Maintenance

This is where the OODA architecture diverges most sharply from ReAct. Rather than updating a chain-of-thought string, Orient explicitly maintains a structured world model.

@dataclass
class WorldModel:
    # Current task state
    task_objective: str
    completed_steps: list[Step]
    pending_hypotheses: list[Hypothesis]
    
    # Environmental state
    known_facts: dict[str, FactWithConfidence]
    contradictions: list[Contradiction]  # Track inconsistencies explicitly
    
    # Risk state
    irreversible_actions_taken: list[Action]
    active_constraints: list[Constraint]
    risk_budget_remaining: float
    
    # Temporal state
    loop_count: int
    elapsed_ms: int
    deadline: Optional[datetime]

class OrientPhase:
    def __init__(self, llm: LLM, memory: AgentMemory):
        self.llm = llm
        self.memory = memory
    
    async def orient(
        self, 
        observation: Observation, 
        world_model: WorldModel
    ) -> OrientationResult:
        
        # Detect contradictions before updating world model
        contradictions = self._detect_contradictions(observation, world_model)
        
        if contradictions:
            # Don't silently override prior beliefs — surface the conflict
            return OrientationResult(
                updated_model=world_model,
                contradictions=contradictions,
                requires_human_review=any(c.severity == 'high' for c in contradictions),
                orientation_confidence=0.4
            )
        
        # Update world model with new observations
        updated_model = await self.llm.update_world_model(
            current_model=world_model,
            new_observation=observation,
            relevant_memory=await self.memory.retrieve(observation)
        )
        
        return OrientationResult(
            updated_model=updated_model,
            contradictions=[],
            requires_human_review=False,
            orientation_confidence=observation.confidence * 0.9
        )
    
    def _detect_contradictions(
        self, 
        obs: Observation, 
        model: WorldModel
    ) -> list[Contradiction]:
        contradictions = []
        for signal in obs.signals:
            for key, known_fact in model.known_facts.items():
                if signal.contradicts(known_fact):
                    contradictions.append(Contradiction(
                        new_signal=signal,
                        prior_belief=known_fact,
                        severity='high' if known_fact.confidence > 0.8 else 'low'
                    ))
        return contradictions

The explicit contradiction detection is the mechanism that prevents the agent from building a house of cards. In production environments, tool responses are often inconsistent — a database query returns a different count than a summary API, a document retrieval returns context that conflicts with the agent’s prior memory. An agent that silently incorporates contradictions into its world model compounds the error. An agent that surfaces contradictions and pauses for review contains it.

The Decide Phase: Structured Planning with Risk Gating

The Decide phase takes the oriented world model and produces a plan — but in the OODA architecture, planning is gated by explicit risk assessment before execution.

class DecidePhase:
    def __init__(self, planner: LLM, risk_engine: RiskEngine):
        self.planner = planner
        self.risk_engine = risk_engine
    
    async def decide(
        self, 
        orientation: OrientationResult,
        world_model: WorldModel
    ) -> Decision:
        
        # Refuse to decide if orientation confidence is too low
        if orientation.orientation_confidence < 0.3:
            return Decision.escalate(
                reason="Insufficient orientation confidence",
                context=orientation
            )
        
        # Generate candidate actions
        candidates = await self.planner.generate_candidates(
            world_model=orientation.updated_model,
            n_candidates=3  # Generate alternatives, don't commit to first option
        )
        
        # Risk-gate each candidate
        risk_assessments = await asyncio.gather(
            *[self.risk_engine.assess(c, world_model) for c in candidates]
        )
        
        # Filter by risk budget and reversibility
        viable_candidates = [
            (c, r) for c, r in zip(candidates, risk_assessments)
            if r.estimated_cost <= world_model.risk_budget_remaining
            and (not r.is_irreversible or world_model.has_human_approval(c))
        ]
        
        if not viable_candidates:
            return Decision.escalate(
                reason="No viable candidates within risk budget",
                context={"candidates": candidates, "assessments": risk_assessments}
            )
        
        # Select highest-value viable candidate
        best_candidate, best_risk = max(
            viable_candidates, 
            key=lambda x: x[1].expected_value
        )
        
        return Decision(
            action=best_candidate,
            risk_assessment=best_risk,
            alternatives_considered=candidates,
            confidence=orientation.orientation_confidence * best_risk.confidence
        )

Three design decisions here that matter in production:

Generate candidates, don’t commit to the first option. The ReAct loop typically produces one action per cycle. The OODA Decide phase generates multiple candidates and selects among them after risk assessment. This is slower, but it produces better decisions in complex environments — and it documents that alternatives were considered, which matters for governance.

Irreversible actions require human approval tokens. Any action that cannot be undone — a transaction, a case closure, a filed document — requires explicit human approval before the Decide phase will return it as a viable option. The approval isn’t advisory; the architecture enforces it structurally. No token, no action.

Risk budget maintenance. The world model tracks a risk budget — a quantitative proxy for the agent’s license to act autonomously. Each action consumes risk budget based on its uncertainty, cost, and reversibility. When the budget is exhausted, the agent cannot take further consequential actions until a human reviews and resets it.

The Act Phase: Execute, Verify, Audit

The Act phase executes the selected action, but adds two components that the ReAct loop typically omits: post-execution verification and structured audit logging.

class ActPhase:
    def __init__(self, executor: ToolExecutor, auditor: AuditLogger):
        self.executor = executor
        self.auditor = auditor
    
    async def act(
        self, 
        decision: Decision,
        world_model: WorldModel
    ) -> ActionResult:
        
        # Log intent before execution (important for audit — captures what was planned)
        await self.auditor.log_intent(
            decision=decision,
            world_model_snapshot=world_model,
            timestamp=datetime.utcnow()
        )
        
        # Execute
        try:
            raw_result = await self.executor.execute(decision.action)
        except Exception as e:
            await self.auditor.log_failure(decision, e)
            return ActionResult.failure(decision, e)
        
        # Post-execution verification
        verified = await self._verify_result(raw_result, decision)
        
        # Log outcome (captures what actually happened — may differ from intent)
        await self.auditor.log_outcome(
            decision=decision,
            result=raw_result,
            verified=verified,
            timestamp=datetime.utcnow()
        )
        
        # Update risk budget based on actual cost
        world_model.risk_budget_remaining -= verified.actual_cost
        
        return ActionResult(
            result=raw_result,
            verification=verified,
            risk_consumed=verified.actual_cost
        )
    
    async def _verify_result(
        self, 
        result: ToolResult, 
        decision: Decision
    ) -> VerificationResult:
        # Check that the result matches what the decision expected
        # Surface discrepancies as observations for the next OODA cycle
        expected = decision.action.expected_postcondition
        actual = result.postcondition
        
        return VerificationResult(
            matches_expected=self._postconditions_match(expected, actual),
            actual_cost=result.resource_cost,
            discrepancies=self._compute_discrepancies(expected, actual)
        )

The split between intent logging (before execution) and outcome logging (after execution) is the governance instrument that allows you to reconstruct the full reasoning chain for any agent action. Intent logs capture what the agent planned and why. Outcome logs capture what actually happened. The gap between them — when it exists — is the most important signal for debugging, validation, and examination response.

Where the OODA Loop Architecture Breaks Down

Honesty requires documenting the failure modes alongside the design.

It’s slower than ReAct for simple tasks. The multi-signal observe, world-model update, candidate generation, and risk-gating add latency. For a simple retrieval task where a single tool call is sufficient, the OODA overhead is wasteful. The architecture is designed for complex, multi-step, high-stakes workflows — not for every agent use case.

World model maintenance is expensive at scale. Keeping a structured world model in memory and updating it coherently across many loop iterations requires careful context management. At 15+ OODA cycles, the world model approaches context limits in current LLMs. The solution is aggressive world model compression — summarizing prior steps while preserving the key facts — but this introduces its own failure mode when the compression discards something that was important.

Contradiction detection can be over-triggering. In environments with noisy or inconsistent data sources, the contradiction detector can block progress by flagging conflicts that are not actually meaningful. Tuning the contradiction threshold is a per-deployment calibration problem that requires operational experience with the specific data sources involved.

Human-in-the-loop gates create bottlenecks. The irreversible-action approval gate is the right governance design, but it means that high-throughput workflows — AML triage, document processing, batch case work — require either very fast human review cycles or a differentiated approval architecture where lower-risk irreversible actions have a lighter approval path. This is an organizational design problem as much as a technical one.

Production Deployment Pattern

For enterprise deployments, I use the following pattern to instantiate the OODA runtime:

class OODAAgentRuntime:
    def __init__(self, config: OODAConfig):
        self.observe = ObservePhase(sensors=config.sensors)
        self.orient = OrientPhase(llm=config.llm, memory=config.memory)
        self.decide = DecidePhase(
            planner=config.planner_llm,
            risk_engine=RiskEngine(thresholds=config.risk_thresholds)
        )
        self.act = ActPhase(
            executor=ToolExecutor(tools=config.tools),
            auditor=AuditLogger(sink=config.audit_sink)
        )
    
    async def run(
        self, 
        objective: str,
        initial_context: dict,
        max_loops: int = 20
    ) -> AgentResult:
        
        world_model = WorldModel.initialize(
            task_objective=objective,
            initial_context=initial_context,
            risk_budget=self.config.risk_budget
        )
        
        for loop_count in range(max_loops):
            world_model.loop_count = loop_count
            
            # Full OODA cycle
            observation = await self.observe.observe(world_model.to_context())
            orientation = await self.orient.orient(observation, world_model)
            
            if orientation.requires_human_review:
                return AgentResult.pause_for_review(world_model, orientation)
            
            decision = await self.decide.decide(orientation, world_model)
            
            if decision.is_escalation:
                return AgentResult.escalate(world_model, decision)
            
            if decision.action.is_terminal:
                return AgentResult.complete(world_model, decision)
            
            action_result = await self.act.act(decision, world_model)
            world_model = world_model.update(orientation, action_result)
        
        # Max loops exceeded — escalate, don't silently fail
        return AgentResult.max_loops_exceeded(world_model)

The terminal conditions — human review, escalation, completion, max loops — are all explicit. The agent does not loop indefinitely, does not silently fail, and does not make consequential decisions when its orientation confidence is below threshold. These are not just good engineering practices. They are the governance controls that make the system auditable.

OODA vs. ReAct: When to Use Which

CharacteristicReActOODA
Task complexitySimple to moderateComplex, multi-step
Information qualityReliable tools, low noiseNoisy, inconsistent, or adversarial inputs
Consequence levelLow — errors are recoverableHigh — errors have downstream impact
Governance requirementsLightweight auditFull audit trail required
Latency budgetTightModerate to flexible
Human oversightNot requiredRequired for material actions

For most chatbot and simple RAG applications, ReAct is sufficient and faster. For enterprise workflows touching regulated decisions — AML triage, credit analysis, compliance research, trading signal generation — the OODA architecture’s explicit orientation management, contradiction detection, risk gating, and audit logging are not optional complexity. They are the production requirements.

The SmartCIO Implementation

The SmartCIO platform implements the OODA runtime as its core decision engine for portfolio analysis and market intelligence workflows. The multi-LLM routing layer (Anthropic + OpenAI + Ollama) feeds the Observe phase as parallel sensors. The Orient phase maintains a persistent world model that spans market data, portfolio state, and risk thresholds. The Decide phase is gated by position size limits and regulatory constraints. The Act phase logs every decision intent and outcome to a structured audit trail.

The most important production lesson from that deployment: the Orientation Layer — specifically, the contradiction detection and world model update logic — is where 80% of agent bugs surface. Not in tool execution. Not in reasoning. In the gap between what the agent believes about the current state and what is actually true.

Boyd was right. Orientation is the dominant element of the loop. The architectures that treat it as a first-class component build better agents. The architectures that treat it as the implicit state of a chain-of-thought string discover that lesson the hard way, in production, at the worst possible time.

Enterprise AI Architecture

Want more enterprise AI architecture breakdowns?

Subscribe to SuperML.

Comments

Sign in to leave a comment

Back to Blog

Related Posts

View All Posts »

LangChain vs LangGraph 2026: Which to Use for Enterprise Agents

LangChain and LangGraph solve different problems and the choice between them is not about preference — it's about the shape of your workflow. This is the architecture decision guide: when chains are enough, when you need stateful graphs, and when to use neither.

RAG Pipeline Production Architecture 2026: Chunking, Retrieval, Re-ranking, and Evaluation

Most RAG tutorials get you from zero to a working demo in 30 minutes. Production RAG takes 6–12 months to get right, and the problems that sink it are not the ones covered in the tutorial. This is the production engineering guide: chunking strategy, hybrid retrieval, re-ranking, evaluation frameworks, and the operational patterns that keep RAG systems working after launch.