AI Agents for Physical AI Systems

Purpose

This chapter introduces AI agents—autonomous systems that perceive, reason, and act to achieve goals. We explore agent architectures relevant to Physical AI, from simple reactive agents to complex learning-based systems.

What is an AI Agent?

Definition: An AI agent is a system that:

Perceives its environment through sensors
Reasons about goals and actions
Acts through actuators to achieve objectives
Learns from experience to improve performance

Formula: Agent = Perception + Reasoning + Action + Learning

Agent vs. System

System: Passive software that responds to external calls.

Example: Function that computes inverse kinematics.

Agent: Autonomous entity with goals and agency.

Example: Robot that autonomously picks up objects using inverse kinematics.

Key Difference: Agency (goal-directed behavior) and autonomy (self-directed action).

Agent Architectures

1. Reactive Agents

Principle: Direct mapping from percepts to actions (no internal state).

Architecture:

Sensors → Condition-Action Rules → Actuators

Example: Vacuum robot

Rule 1: If obstacle detected, turn left
Rule 2: If floor dirty, vacuum
Rule 3: If battery low, return to charger

Advantages:

Fast (no deliberation)
Simple to implement
Real-time capable

Limitations:

No memory (repeats mistakes)
No planning (short-sighted)
Limited to simple tasks

2. Model-Based Agents

Principle: Maintain internal state representing world model.

Architecture:

Sensors → State Estimation → World Model → Action Selection → Actuators

Example: Delivery robot

State: Current position, goal position, map
Model: Occupancy grid, obstacle locations
Action: Compute path avoiding obstacles

Advantages:

Handles partially observable environments
Plans multi-step actions
Adapts to changing world

Limitations:

Requires accurate model (modeling errors degrade performance)
Computationally expensive (state estimation, planning)

3. Goal-Based Agents

Principle: Select actions that achieve explicit goals.

Architecture:

Sensors → State Estimation → Goal + World Model → Search/Planning → Actuators

Example: Robotic arm

Goal: Grasp red cup
Planning: Search for action sequence (approach → align → close gripper)
Execution: Execute plan, monitor for success

Advantages:

Flexible (change goal, behavior adapts)
Optimal (can search for best plan)

Limitations:

Slow (search can take seconds)
Requires goal specification

4. Utility-Based Agents

Principle: Maximize utility function (numeric measure of desirability).

Architecture:

Sensors → State → Utility Function → Optimization → Actuators

Example: Autonomous vehicle

Utility: Safety (high), Efficiency (medium), Comfort (low)
Decision: Brake hard (safe but uncomfortable) vs. slow brake (less safe but comfortable)
Result: Choose action maximizing weighted utility

Advantages:

Handles tradeoffs (safety vs. efficiency)
Quantifies preferences
Supports multi-objective optimization

Limitations:

Difficult to design utility function
Computational complexity (optimization)

5. Learning Agents

Principle: Improve performance through experience.

Architecture:

Sensors → State → Policy → Actuators
   ↓                          ↓
Learning Module ← Reward/Error

Example: Manipulation robot

Initial Policy: Random grasping
Experience: Attempt 1000 grasps
Reward: +1 if successful, -1 if failed
Learning: Update policy to maximize success rate
Result: 30% → 85% success after training

Types:

Supervised Learning: Learn from labeled examples (imitation learning)
Reinforcement Learning: Learn from rewards (trial-and-error)
Unsupervised Learning: Discover patterns in data (clustering, dimensionality reduction)

Advantages:

Adapts to new environments
No need for explicit programming
Can surpass human performance (in narrow domains)

Limitations:

Requires large amounts of data
Sample inefficient (especially in physical systems)
Difficult to guarantee safety

Agent Components in Physical AI

Perception Module

Function: Convert sensor data into symbolic/geometric representations.

Inputs:

Camera images (RGB, depth)
LiDAR point clouds
Force/torque sensor readings
Joint encoder angles

Outputs:

Object detections (class, pose, bounding box)
Semantic map (free space, obstacles, landmarks)
Robot state (position, velocity, configuration)

Technologies:

Computer vision (YOLO, Mask R-CNN)
SLAM (ORB-SLAM, LIO-SAM)
State estimation (Kalman filter, particle filter)

Reasoning Module

Function: Decide what to do given current state and goal.

Approaches:

Symbolic Reasoning:

Logic-based (first-order logic, PDDL)
Rule-based (expert systems)
Search-based (A*, MCTS)

Probabilistic Reasoning:

Bayesian networks
Markov decision processes (MDPs)
Partially observable MDPs (POMDPs)

Neural Reasoning:

Deep Q-Networks (DQN)
Policy gradient methods (PPO, SAC)
Transformers (decision transformers)

Action Module

Function: Execute decisions in physical world.

Layers:

High-Level Actions: "Pick up cup" (symbolic)
Motion Planning: Compute joint trajectories (geometric)
Control: Track trajectories with feedback (reactive)
Actuation: Send torque commands to motors (hardware)

Technologies:

Inverse kinematics (analytical, numerical)
Trajectory optimization (CHOMP, TrajOpt)
Feedback control (PID, MPC)

Learning Module

Function: Improve agent performance over time.

Learning Signals:

Rewards: Scalar feedback (+1 success, -1 failure)
Demonstrations: Expert examples to imitate
Corrections: Human feedback on mistakes

Methods:

Reinforcement Learning: Learn policy from rewards
Imitation Learning: Clone expert behavior
Meta-Learning: Learn how to learn (few-shot adaptation)

Practical Example: Warehouse Picking Agent

Task: Autonomously pick items from shelves and place in bins.

Agent Type: Model-based + Learning

Architecture:

Perception:

RGB-D camera detects items on shelf
Segment objects, estimate 6D poses
Output: List of (object, pose, confidence)

Reasoning:

Goal: Pick all items
Planning: For each object:
1. Compute approach trajectory (avoid collisions)
2. Plan grasp (antipodal points, force closure)
3. Plan retreat trajectory (lift object)

Action:

Execute arm trajectory (inverse kinematics + motion planning)
Close gripper with force control (detect contact)
Verify grasp (tactile sensor confirms object held)

Learning:

Offline: Train grasp network on 1M simulated grasps
Online: Fine-tune on real objects (100 examples)
Adaptation: Adjust grasp depth for slippery objects

Performance:

Initial: 60% success rate
After learning: 90% success rate
Speed: 10 picks/minute

Multi-Agent Systems Preview

When multiple agents interact:

Coordination: Divide tasks among agents
Communication: Share information (positions, goals)
Negotiation: Resolve conflicts (both want same object)

Example: Warehouse with 10 robots

Centralized planner assigns tasks
Robots share maps (SLAM)
Collision avoidance (decentralized, reactive)

Key Challenges

1. Perception Uncertainty

Problem: Sensors are noisy, objects occluded, lighting varies.

Impact: Wrong object detection → failed grasp.

Solution: Probabilistic perception, active sensing, uncertainty-aware planning.

2. Action Execution Failure

Problem: Plan assumes perfect execution, reality has errors.

Impact: Arm misses grasp point by 2cm → drops object.

Solution: Closed-loop control, compliance, error detection and recovery.

3. Long-Horizon Planning

Problem: Complex tasks require 10+ step sequences.

Impact: Exponential search space, slow planning.

Solution: Hierarchical planning, learned heuristics, anytime algorithms.

4. Sample Efficiency

Problem: Physical trials are slow (real-time), expensive (wear), dangerous (damage).

Impact: Cannot train for millions of iterations like simulation.

Solution: Sim-to-real transfer, few-shot learning, human demonstrations.

Key Takeaways

AI agents are autonomous systems that perceive, reason, act, and learn to achieve goals in their environment.
Agent architectures range from reactive (simple, fast) to learning-based (adaptive, complex) with tradeoffs between speed, flexibility, and performance.
Key agent components include perception (sensor processing), reasoning (planning/decision-making), action (control/execution), and learning (improvement over time).
Model-based agents maintain world models for handling partial observability and multi-step planning.
Learning agents improve through experience using reinforcement learning, imitation learning, or meta-learning.
Physical AI agents face unique challenges: perception uncertainty, execution errors, long-horizon planning, and sample efficiency in the real world.
Practical agents combine multiple architectures (e.g., reactive for safety, deliberative for planning, learning for adaptation).

Next Chapter: Multi-agent systems—coordination, communication, and collaboration among multiple Physical AI agents.

Purpose​

What is an AI Agent?​

Agent vs. System​

Agent Architectures​

1. Reactive Agents​

2. Model-Based Agents​

3. Goal-Based Agents​

4. Utility-Based Agents​

5. Learning Agents​

Agent Components in Physical AI​

Perception Module​

Reasoning Module​

Action Module​

Learning Module​

Practical Example: Warehouse Picking Agent​

Multi-Agent Systems Preview​

Key Challenges​

1. Perception Uncertainty​

2. Action Execution Failure​

3. Long-Horizon Planning​

4. Sample Efficiency​

Key Takeaways​

Purpose

What is an AI Agent?

Agent vs. System

Agent Architectures

1. Reactive Agents

2. Model-Based Agents

3. Goal-Based Agents

4. Utility-Based Agents

5. Learning Agents

Agent Components in Physical AI

Perception Module

Reasoning Module

Action Module

Learning Module

Practical Example: Warehouse Picking Agent

Multi-Agent Systems Preview

Key Challenges

1. Perception Uncertainty

2. Action Execution Failure

3. Long-Horizon Planning

4. Sample Efficiency

Key Takeaways