Control Systems for Physical AI
Problem Framing
Control bridges the gap between high-level plans and low-level actuation. In Physical AI, control must:
- Track desired trajectories despite disturbances (external forces, friction, model errors)
- Guarantee stability (system doesn't diverge or oscillate uncontrollably)
- Operate in real-time (1-10ms control cycles for humanoids)
- Adapt to changing conditions (payload variation, terrain changes, wear)
Core Challenge: Design controllers that are robust, performant, and computationally feasible for real-time embedded systems.
Classical Control: PID
Principle
PID (Proportional-Integral-Derivative) is the workhorse of industrial control:
u(t) = Kp × e(t) + Ki × ∫e(τ)dτ + Kd × de(t)/dt
Where:
- e(t): Error (desired - actual)
- Kp: Proportional gain (immediate response)
- Ki: Integral gain (eliminate steady-state error)
- Kd: Derivative gain (dampen oscillations)
Component Roles
Proportional (P):
- Responds proportionally to current error
- Large error → large control action
- Limitation: Cannot eliminate steady-state error, can cause overshoot
Integral (I):
- Accumulates past errors
- Eliminates steady-state error (bias compensation)
- Limitation: Can cause overshoot, wind-up issues
Derivative (D):
- Predicts future error based on rate of change
- Dampens oscillations, improves stability
- Limitation: Amplifies measurement noise
Tuning Methods
Ziegler-Nichols:
- Set Ki=0, Kd=0
- Increase Kp until system oscillates (critical gain Kc)
- Measure oscillation period Tc
- Apply tuning rules: Kp=0.6Kc, Ki=2Kp/Tc, Kd=KpTc/8
Manual Tuning:
- Start with P-only: Increase Kp until fast response (accept overshoot)
- Add D: Increase Kd to reduce overshoot
- Add I: Small Ki to eliminate steady-state error
- Iterate: Fine-tune for desired response
Rule of Thumb:
- Position control: PD (no integral to avoid drift)
- Velocity control: PI (integral eliminates friction bias)
- Temperature control: PID (slow dynamics benefit from all terms)
Practical Example: Joint Position Control
System: Robotic arm joint (DC motor + harmonic drive)
Desired: θ_desired(t) (joint angle trajectory)
Measured: θ_actual(t) (encoder reading)
Control Law:
τ_command = Kp(θ_desired - θ_actual) + Kd(θ̇_desired - θ̇_actual)
Parameters (tuned empirically):
- Kp = 100 Nm/rad (stiff response)
- Kd = 5 Nm·s/rad (damping)
Performance:
- Settling time: 0.2s
- Overshoot: under 5%
- Steady-state error: under 0.1°
Limitation: PID assumes linear system. Real joints have:
- Nonlinear friction (stiction, Coulomb, viscous)
- Backlash in gearbox
- Variable inertia (payload changes)
Model Predictive Control (MPC)
Principle
MPC optimizes control over future horizon:
minimize: ∑(||x(t+k) - x_ref||² + ||u(t+k)||²)
subject to:
- x(t+k+1) = f(x(t+k), u(t+k)) # Dynamics
- u_min ≤ u(t+k) ≤ u_max # Actuation limits
- x_safe ⊂ x(t+k) # Safety constraints
Where:
- x: State (position, velocity, etc.)
- u: Control input (torque, force)
- f: System dynamics model
- Horizon: N future timesteps
Receding Horizon: Apply first control action, re-optimize next timestep.
Advantages Over PID
1. Constraint Handling:
- Explicitly enforces torque limits, joint limits, collision avoidance
- PID: Saturates control (no foresight), can violate constraints
2. Predictive Capability:
- Anticipates future state evolution
- Smoothly approaches limits (doesn't bang into them)
3. Multi-Objective Optimization:
- Balance tracking accuracy vs energy consumption vs smoothness
- PID: Single objective (minimize error)
4. Nonlinear Dynamics:
- Can use nonlinear system models
- PID: Assumes linear system
Limitations
Computational Cost:
- Requires solving optimization problem every timestep (1-10ms)
- Complex systems (humanoid: 30+ DOF) → large optimization
- Mitigation: Fast solvers (OSQP, qpOASES), embedded MPC libraries
Model Accuracy:
- Performance degrades with model errors
- Mitigation: Robust MPC, adaptive MPC, hybrid MPC+feedback
Practical Example: Quadcopter Trajectory Tracking
System: 6-DOF quadcopter (position + orientation)
Model: Simplified dynamics
ẍ = (R × F - mg) / m
ω̇ = I⁻¹ × (τ - ω × Iω)
MPC Formulation:
- States: [x, y, z, φ, θ, ψ, ẋ, ẏ, ż, ω_x, ω_y, ω_z] (12D)
- Controls: [F, τ_x, τ_y, τ_z] (4D thrust/torques)
- Horizon: 10 timesteps (0.1s at 100Hz)
- Constraints:
- Thrust limits: 0 ≤ F ≤ 2mg
- Tilt limits: |φ|, |θ| ≤ 30°
- Obstacle avoidance: ||x(t+k) - x_obstacle|| ≥ r_safe
Performance:
- Trajectory tracking error: under 0.1m
- Smooth control (no chattering)
- Respects constraints (never violates tilt limits)
- Compute time: 5ms (feasible at 100Hz)
Learning-Based Control
Motivation
Classical control requires:
- Accurate dynamics model (hard for complex systems)
- Manual tuning (time-consuming, requires expertise)
- Struggles with unmodeled dynamics (cable drag, air resistance, contact)
Learning-based control: Learn control policy from data (simulation or real-world).
Reinforcement Learning (RL)
Principle: Learn policy π(s) → a that maximizes cumulative reward.
Training Process:
- Initialize random policy
- Execute actions in environment
- Observe rewards (task success, energy cost)
- Update policy to increase reward
Algorithms:
- PPO (Proximal Policy Optimization): Stable, widely used
- SAC (Soft Actor-Critic): Sample-efficient, continuous control
- TD3 (Twin Delayed DDPG): Robust to hyperparameters
Advantages:
- No dynamics model needed (model-free RL)
- Handles complex, high-dimensional systems
- Can discover novel strategies
Limitations:
- Requires many samples (1M+ interactions)
- No safety guarantees during learning
- Difficult to interpret learned policies
Imitation Learning
Principle: Learn policy by mimicking expert demonstrations.
Behavioral Cloning:
- Collect expert demonstrations (s, a) pairs
- Train neural network: π(s) ≈ a_expert
- Advantage: Fast training (1k-10k demonstrations)
- Limitation: Distribution shift (policy drifts off expert states)
DAgger (Dataset Aggregation):
- Train initial policy from expert data
- Execute policy, collect states
- Query expert for actions at new states
- Retrain policy on aggregated dataset
- Advantage: Reduces distribution shift
Practical Example: Manipulation:
- Expert: Human teleoperation (100 demos of picking diverse objects)
- Learned policy: Neural network (camera image → gripper action)
- Success rate: 75% (vs 30% for hand-tuned heuristic)
Hybrid: Learning + Classical Control
Architecture: Learned high-level policy + classical low-level controller.
Example: Quadrupedal Locomotion:
- High-level (RL): Learned gait pattern, footstep locations (10 Hz)
- Low-level (PID/MPC): Track joint trajectories (1 kHz)
Advantages:
- Learn complex coordination (which RL excels at)
- Leverage proven low-level control (guaranteed stability)
- Sample-efficient (RL only controls slow, high-level actions)
MIT Cheetah Robot:
- Learned policy for gait adaptation (rough terrain)
- MPC for balance and tracking
- Result: 3 m/s running on uneven terrain
Stability vs Adaptability Tradeoff
Stability Requirement
Lyapunov Stability: System returns to equilibrium after disturbance.
Guaranteed Stability (Classical Control):
- PID: Stable for sufficiently small gains (tuning rules ensure this)
- LQR: Optimal controller with guaranteed stability margins
- MPC: Constrained optimization ensures feasible, stable trajectories
Challenge with Learning: No stability guarantees.
- Learned policies can diverge (unbounded outputs)
- Mitigation: Constrained RL, safety layers, residual learning
Adaptability Requirement
Adaptation: Controller adjusts to changing conditions without re-tuning.
Limitations of Classical Control:
- Fixed gains (Kp, Kd) tuned for nominal system
- Performance degrades with payload changes, wear, terrain variation
Adaptive Control:
- Gain Scheduling: Switch controller gains based on operating regime (e.g., different gains for high/low speed)
- Model Reference Adaptive Control (MRAC): Adjust controller parameters online to match desired model behavior
- Learning-Based Adaptation: Meta-learning, few-shot adaptation
Engineering Tradeoff
Safety-Critical Systems (surgical robots, autonomous vehicles):
- Prioritize stability → Classical control (PID, MPC) with formal verification
- Limited adaptability acceptable (require manual re-tuning)
Research/Prototype Systems (legged robots, dexterous manipulation):
- Prioritize adaptability → Learning-based control
- Accept some instability during training (safe environment, human oversight)
Production Systems (warehouse robots, drones):
- Hybrid approach: Classical control baseline + learned adaptation layer
- Best of both worlds: Stability guarantees + adaptation capability
Control Loops in Humanoid Robots
Hierarchical Control Architecture
Application (Task) 1 Hz "Pick up cup"
↓
Motion Planning 10 Hz Whole-body trajectory
↓
Whole-Body Controller 100 Hz Joint torques (QP optimization)
↓
Joint-Level Control 1000 Hz Motor currents (PID/torque control)
↓
Motor Drivers 10000 Hz PWM signals
Whole-Body Control (WBC)
Problem: Coordinate 30+ degrees of freedom while maintaining balance.
Formulation (Quadratic Program):
minimize: ||q̈ - q̈_desired||²
subject to:
- Contact forces in friction cone
- Joint torque limits
- Zero Moment Point (ZMP) inside support polygon
Output: Joint accelerations q̈ → integrated to joint velocities → commanded to low-level controllers.
Update Rate: 100-500 Hz (sufficient for dynamic tasks like running).
Libraries: Drake (MIT), Pinocchio, RBDL.
Balance Control
Approach: Regulate Center of Mass (CoM) position to maintain ZMP inside support polygon.
Control Law:
F_ankle = Kp(CoM_desired - CoM_actual) + Kd(CoṀ_desired - CoṀ_actual)
Sensors:
- IMU: Torso orientation, angular velocity
- Joint encoders: Joint angles → forward kinematics → CoM position
- Foot force sensors: Ground reaction forces
Disturbance Rejection:
- Push from side → IMU detects tilt → ankle torque corrects
- Uneven terrain → foot force changes → adjust CoM position
Compliance Control
Motivation: Rigid position control causes high forces during contact.
Impedance Control: Render virtual spring-damper at end-effector.
F = K(x_desired - x_actual) + D(ẋ_desired - ẋ_actual)
Compliance: Allows controlled deviation from desired position under external force.
Use Cases:
- Human-robot collaboration (safe contact)
- Manipulation of fragile objects (controlled force)
- Contact-rich tasks (insertion, assembly)
Example: Screwdriving:
- High stiffness in XY (lateral precision)
- Low stiffness in Z (compliant insertion)
- Result: Screw finds hole despite slight misalignment
Practical Control Examples
Example 1: Mobile Robot Navigation
Task: Follow path while avoiding obstacles.
Control Hierarchy:
1. Path Planning (1 Hz):
- Input: Goal position, map
- Output: Waypoint sequence
- Algorithm: A* or RRT
2. Local Planner (10 Hz):
- Input: Current waypoint, local obstacle scan
- Output: Velocity command (v, ω)
- Algorithm: Dynamic Window Approach (DWA)
3. Velocity Controller (100 Hz):
- Input: Desired velocity (v_des, ω_des)
- Output: Motor commands (left, right wheel velocities)
- Algorithm: Differential drive kinematics + PID
Disturbance Rejection:
- Wheel slip → odometry drifts → localization corrects using LiDAR scan matching
- Unexpected obstacle → local planner re-routes → smooth avoidance
Example 2: Robotic Arm Reaching
Task: Move end-effector to target pose.
Control Hierarchy:
1. Trajectory Planning (10 Hz):
- Input: Current pose, target pose
- Output: Joint trajectory (position, velocity, acceleration)
- Algorithm: Cubic spline or minimum jerk
2. Joint-Level Tracking (1 kHz):
- Input: Desired joint state
- Output: Joint torques
- Algorithm: PD control + feedforward gravity compensation
Feedforward Compensation:
τ = Kp(q_des - q) + Kd(q̇_des - q̇) + τ_gravity(q)
Where τ_gravity(q) compensates for gravitational torque (from dynamics model).
Result: Precise tracking (under 1mm error) with smooth motion.
Example 3: Drone Altitude Control
Task: Maintain altitude despite wind disturbances.
Cascade Control:
Outer Loop (Position) (50 Hz):
- Input: Desired altitude z_des
- Output: Desired velocity ż_des
- Controller: PD
Inner Loop (Velocity) (200 Hz):
- Input: Desired velocity ż_des
- Output: Thrust command F
- Controller: PI
Rationale: Cascade structure separates slow dynamics (position) from fast dynamics (velocity).
Disturbance Rejection:
- Wind gust → altitude drops → outer loop commands upward velocity → inner loop increases thrust → altitude recovers
Key Takeaways
-
PID control is the foundation of industrial robotics: Simple, robust, real-time capable, but requires manual tuning and struggles with constraints and nonlinearities.
-
Model Predictive Control (MPC) handles constraints and predictive planning by optimizing over future horizon, suitable for systems with accurate models and sufficient compute (5-50ms solve time).
-
Learning-based control (RL, imitation learning) handles complex dynamics without manual modeling but requires extensive training data and lacks formal safety guarantees.
-
Stability vs adaptability is the core tradeoff: Classical control guarantees stability, learning-based control provides adaptability—hybrid approaches combine both.
-
Humanoid control uses hierarchical architecture: Task planning (1 Hz) → motion planning (10 Hz) → whole-body control (100 Hz) → joint control (1 kHz) → motor drivers (10 kHz).
-
Whole-body control coordinates 30+ DOF through quadratic programming, enforcing contact constraints, torque limits, and balance (ZMP) constraints in real-time.
-
Practical control combines feedforward (model-based compensation) and feedback (error correction) for high performance: gravity compensation + PD control, cascade control for multi-rate systems.
Next Chapter: Simulation and digital twins—building virtual environments for Physical AI development and validation.