Skip to main content

Control Systems for Physical AI

Problem Framing

Control bridges the gap between high-level plans and low-level actuation. In Physical AI, control must:

  • Track desired trajectories despite disturbances (external forces, friction, model errors)
  • Guarantee stability (system doesn't diverge or oscillate uncontrollably)
  • Operate in real-time (1-10ms control cycles for humanoids)
  • Adapt to changing conditions (payload variation, terrain changes, wear)

Core Challenge: Design controllers that are robust, performant, and computationally feasible for real-time embedded systems.

Classical Control: PID

Principle

PID (Proportional-Integral-Derivative) is the workhorse of industrial control:

u(t) = Kp × e(t) + Ki × ∫e(τ)dτ + Kd × de(t)/dt

Where:

  • e(t): Error (desired - actual)
  • Kp: Proportional gain (immediate response)
  • Ki: Integral gain (eliminate steady-state error)
  • Kd: Derivative gain (dampen oscillations)

Component Roles

Proportional (P):

  • Responds proportionally to current error
  • Large error → large control action
  • Limitation: Cannot eliminate steady-state error, can cause overshoot

Integral (I):

  • Accumulates past errors
  • Eliminates steady-state error (bias compensation)
  • Limitation: Can cause overshoot, wind-up issues

Derivative (D):

  • Predicts future error based on rate of change
  • Dampens oscillations, improves stability
  • Limitation: Amplifies measurement noise

Tuning Methods

Ziegler-Nichols:

  1. Set Ki=0, Kd=0
  2. Increase Kp until system oscillates (critical gain Kc)
  3. Measure oscillation period Tc
  4. Apply tuning rules: Kp=0.6Kc, Ki=2Kp/Tc, Kd=KpTc/8

Manual Tuning:

  1. Start with P-only: Increase Kp until fast response (accept overshoot)
  2. Add D: Increase Kd to reduce overshoot
  3. Add I: Small Ki to eliminate steady-state error
  4. Iterate: Fine-tune for desired response

Rule of Thumb:

  • Position control: PD (no integral to avoid drift)
  • Velocity control: PI (integral eliminates friction bias)
  • Temperature control: PID (slow dynamics benefit from all terms)

Practical Example: Joint Position Control

System: Robotic arm joint (DC motor + harmonic drive)

Desired: θ_desired(t) (joint angle trajectory)

Measured: θ_actual(t) (encoder reading)

Control Law:

τ_command = Kp(θ_desired - θ_actual) + Kd(θ̇_desired - θ̇_actual)

Parameters (tuned empirically):

  • Kp = 100 Nm/rad (stiff response)
  • Kd = 5 Nm·s/rad (damping)

Performance:

  • Settling time: 0.2s
  • Overshoot: under 5%
  • Steady-state error: under 0.1°

Limitation: PID assumes linear system. Real joints have:

  • Nonlinear friction (stiction, Coulomb, viscous)
  • Backlash in gearbox
  • Variable inertia (payload changes)

Model Predictive Control (MPC)

Principle

MPC optimizes control over future horizon:

minimize: ∑(||x(t+k) - x_ref||² + ||u(t+k)||²)
subject to:
- x(t+k+1) = f(x(t+k), u(t+k)) # Dynamics
- u_min ≤ u(t+k) ≤ u_max # Actuation limits
- x_safe ⊂ x(t+k) # Safety constraints

Where:

  • x: State (position, velocity, etc.)
  • u: Control input (torque, force)
  • f: System dynamics model
  • Horizon: N future timesteps

Receding Horizon: Apply first control action, re-optimize next timestep.

Advantages Over PID

1. Constraint Handling:

  • Explicitly enforces torque limits, joint limits, collision avoidance
  • PID: Saturates control (no foresight), can violate constraints

2. Predictive Capability:

  • Anticipates future state evolution
  • Smoothly approaches limits (doesn't bang into them)

3. Multi-Objective Optimization:

  • Balance tracking accuracy vs energy consumption vs smoothness
  • PID: Single objective (minimize error)

4. Nonlinear Dynamics:

  • Can use nonlinear system models
  • PID: Assumes linear system

Limitations

Computational Cost:

  • Requires solving optimization problem every timestep (1-10ms)
  • Complex systems (humanoid: 30+ DOF) → large optimization
  • Mitigation: Fast solvers (OSQP, qpOASES), embedded MPC libraries

Model Accuracy:

  • Performance degrades with model errors
  • Mitigation: Robust MPC, adaptive MPC, hybrid MPC+feedback

Practical Example: Quadcopter Trajectory Tracking

System: 6-DOF quadcopter (position + orientation)

Model: Simplified dynamics

ẍ = (R × F - mg) / m
ω̇ = I⁻¹ × (τ - ω × Iω)

MPC Formulation:

  • States: [x, y, z, φ, θ, ψ, ẋ, ẏ, ż, ω_x, ω_y, ω_z] (12D)
  • Controls: [F, τ_x, τ_y, τ_z] (4D thrust/torques)
  • Horizon: 10 timesteps (0.1s at 100Hz)
  • Constraints:
    • Thrust limits: 0 ≤ F ≤ 2mg
    • Tilt limits: |φ|, |θ| ≤ 30°
    • Obstacle avoidance: ||x(t+k) - x_obstacle|| ≥ r_safe

Performance:

  • Trajectory tracking error: under 0.1m
  • Smooth control (no chattering)
  • Respects constraints (never violates tilt limits)
  • Compute time: 5ms (feasible at 100Hz)

Learning-Based Control

Motivation

Classical control requires:

  • Accurate dynamics model (hard for complex systems)
  • Manual tuning (time-consuming, requires expertise)
  • Struggles with unmodeled dynamics (cable drag, air resistance, contact)

Learning-based control: Learn control policy from data (simulation or real-world).

Reinforcement Learning (RL)

Principle: Learn policy π(s) → a that maximizes cumulative reward.

Training Process:

  1. Initialize random policy
  2. Execute actions in environment
  3. Observe rewards (task success, energy cost)
  4. Update policy to increase reward

Algorithms:

  • PPO (Proximal Policy Optimization): Stable, widely used
  • SAC (Soft Actor-Critic): Sample-efficient, continuous control
  • TD3 (Twin Delayed DDPG): Robust to hyperparameters

Advantages:

  • No dynamics model needed (model-free RL)
  • Handles complex, high-dimensional systems
  • Can discover novel strategies

Limitations:

  • Requires many samples (1M+ interactions)
  • No safety guarantees during learning
  • Difficult to interpret learned policies

Imitation Learning

Principle: Learn policy by mimicking expert demonstrations.

Behavioral Cloning:

  • Collect expert demonstrations (s, a) pairs
  • Train neural network: π(s) ≈ a_expert
  • Advantage: Fast training (1k-10k demonstrations)
  • Limitation: Distribution shift (policy drifts off expert states)

DAgger (Dataset Aggregation):

  • Train initial policy from expert data
  • Execute policy, collect states
  • Query expert for actions at new states
  • Retrain policy on aggregated dataset
  • Advantage: Reduces distribution shift

Practical Example: Manipulation:

  • Expert: Human teleoperation (100 demos of picking diverse objects)
  • Learned policy: Neural network (camera image → gripper action)
  • Success rate: 75% (vs 30% for hand-tuned heuristic)

Hybrid: Learning + Classical Control

Architecture: Learned high-level policy + classical low-level controller.

Example: Quadrupedal Locomotion:

  • High-level (RL): Learned gait pattern, footstep locations (10 Hz)
  • Low-level (PID/MPC): Track joint trajectories (1 kHz)

Advantages:

  • Learn complex coordination (which RL excels at)
  • Leverage proven low-level control (guaranteed stability)
  • Sample-efficient (RL only controls slow, high-level actions)

MIT Cheetah Robot:

  • Learned policy for gait adaptation (rough terrain)
  • MPC for balance and tracking
  • Result: 3 m/s running on uneven terrain

Stability vs Adaptability Tradeoff

Stability Requirement

Lyapunov Stability: System returns to equilibrium after disturbance.

Guaranteed Stability (Classical Control):

  • PID: Stable for sufficiently small gains (tuning rules ensure this)
  • LQR: Optimal controller with guaranteed stability margins
  • MPC: Constrained optimization ensures feasible, stable trajectories

Challenge with Learning: No stability guarantees.

  • Learned policies can diverge (unbounded outputs)
  • Mitigation: Constrained RL, safety layers, residual learning

Adaptability Requirement

Adaptation: Controller adjusts to changing conditions without re-tuning.

Limitations of Classical Control:

  • Fixed gains (Kp, Kd) tuned for nominal system
  • Performance degrades with payload changes, wear, terrain variation

Adaptive Control:

  • Gain Scheduling: Switch controller gains based on operating regime (e.g., different gains for high/low speed)
  • Model Reference Adaptive Control (MRAC): Adjust controller parameters online to match desired model behavior
  • Learning-Based Adaptation: Meta-learning, few-shot adaptation

Engineering Tradeoff

Safety-Critical Systems (surgical robots, autonomous vehicles):

  • Prioritize stability → Classical control (PID, MPC) with formal verification
  • Limited adaptability acceptable (require manual re-tuning)

Research/Prototype Systems (legged robots, dexterous manipulation):

  • Prioritize adaptability → Learning-based control
  • Accept some instability during training (safe environment, human oversight)

Production Systems (warehouse robots, drones):

  • Hybrid approach: Classical control baseline + learned adaptation layer
  • Best of both worlds: Stability guarantees + adaptation capability

Control Loops in Humanoid Robots

Hierarchical Control Architecture

Application (Task)          1 Hz    "Pick up cup"

Motion Planning 10 Hz Whole-body trajectory

Whole-Body Controller 100 Hz Joint torques (QP optimization)

Joint-Level Control 1000 Hz Motor currents (PID/torque control)

Motor Drivers 10000 Hz PWM signals

Whole-Body Control (WBC)

Problem: Coordinate 30+ degrees of freedom while maintaining balance.

Formulation (Quadratic Program):

minimize: ||q̈ - q̈_desired||²
subject to:
- Contact forces in friction cone
- Joint torque limits
- Zero Moment Point (ZMP) inside support polygon

Output: Joint accelerations q̈ → integrated to joint velocities → commanded to low-level controllers.

Update Rate: 100-500 Hz (sufficient for dynamic tasks like running).

Libraries: Drake (MIT), Pinocchio, RBDL.

Balance Control

Approach: Regulate Center of Mass (CoM) position to maintain ZMP inside support polygon.

Control Law:

F_ankle = Kp(CoM_desired - CoM_actual) + Kd(CoṀ_desired - CoṀ_actual)

Sensors:

  • IMU: Torso orientation, angular velocity
  • Joint encoders: Joint angles → forward kinematics → CoM position
  • Foot force sensors: Ground reaction forces

Disturbance Rejection:

  • Push from side → IMU detects tilt → ankle torque corrects
  • Uneven terrain → foot force changes → adjust CoM position

Compliance Control

Motivation: Rigid position control causes high forces during contact.

Impedance Control: Render virtual spring-damper at end-effector.

F = K(x_desired - x_actual) + D(ẋ_desired - ẋ_actual)

Compliance: Allows controlled deviation from desired position under external force.

Use Cases:

  • Human-robot collaboration (safe contact)
  • Manipulation of fragile objects (controlled force)
  • Contact-rich tasks (insertion, assembly)

Example: Screwdriving:

  • High stiffness in XY (lateral precision)
  • Low stiffness in Z (compliant insertion)
  • Result: Screw finds hole despite slight misalignment

Practical Control Examples

Example 1: Mobile Robot Navigation

Task: Follow path while avoiding obstacles.

Control Hierarchy:

1. Path Planning (1 Hz):

  • Input: Goal position, map
  • Output: Waypoint sequence
  • Algorithm: A* or RRT

2. Local Planner (10 Hz):

  • Input: Current waypoint, local obstacle scan
  • Output: Velocity command (v, ω)
  • Algorithm: Dynamic Window Approach (DWA)

3. Velocity Controller (100 Hz):

  • Input: Desired velocity (v_des, ω_des)
  • Output: Motor commands (left, right wheel velocities)
  • Algorithm: Differential drive kinematics + PID

Disturbance Rejection:

  • Wheel slip → odometry drifts → localization corrects using LiDAR scan matching
  • Unexpected obstacle → local planner re-routes → smooth avoidance

Example 2: Robotic Arm Reaching

Task: Move end-effector to target pose.

Control Hierarchy:

1. Trajectory Planning (10 Hz):

  • Input: Current pose, target pose
  • Output: Joint trajectory (position, velocity, acceleration)
  • Algorithm: Cubic spline or minimum jerk

2. Joint-Level Tracking (1 kHz):

  • Input: Desired joint state
  • Output: Joint torques
  • Algorithm: PD control + feedforward gravity compensation

Feedforward Compensation:

τ = Kp(q_des - q) + Kd(q̇_des - q̇) + τ_gravity(q)

Where τ_gravity(q) compensates for gravitational torque (from dynamics model).

Result: Precise tracking (under 1mm error) with smooth motion.

Example 3: Drone Altitude Control

Task: Maintain altitude despite wind disturbances.

Cascade Control:

Outer Loop (Position) (50 Hz):

  • Input: Desired altitude z_des
  • Output: Desired velocity ż_des
  • Controller: PD

Inner Loop (Velocity) (200 Hz):

  • Input: Desired velocity ż_des
  • Output: Thrust command F
  • Controller: PI

Rationale: Cascade structure separates slow dynamics (position) from fast dynamics (velocity).

Disturbance Rejection:

  • Wind gust → altitude drops → outer loop commands upward velocity → inner loop increases thrust → altitude recovers

Key Takeaways

  1. PID control is the foundation of industrial robotics: Simple, robust, real-time capable, but requires manual tuning and struggles with constraints and nonlinearities.

  2. Model Predictive Control (MPC) handles constraints and predictive planning by optimizing over future horizon, suitable for systems with accurate models and sufficient compute (5-50ms solve time).

  3. Learning-based control (RL, imitation learning) handles complex dynamics without manual modeling but requires extensive training data and lacks formal safety guarantees.

  4. Stability vs adaptability is the core tradeoff: Classical control guarantees stability, learning-based control provides adaptability—hybrid approaches combine both.

  5. Humanoid control uses hierarchical architecture: Task planning (1 Hz) → motion planning (10 Hz) → whole-body control (100 Hz) → joint control (1 kHz) → motor drivers (10 kHz).

  6. Whole-body control coordinates 30+ DOF through quadratic programming, enforcing contact constraints, torque limits, and balance (ZMP) constraints in real-time.

  7. Practical control combines feedforward (model-based compensation) and feedback (error correction) for high performance: gravity compensation + PD control, cascade control for multi-rate systems.


Next Chapter: Simulation and digital twins—building virtual environments for Physical AI development and validation.