Robotics:
Dexterous manipulation (OpenAI hand)
Locomotion (Boston Dynamics, Agility)
Sim-to-real transfer (train in simulation)
Games:
OpenAI Five (Dota 2, beat world champs)
AlphaStar (StarCraft II, Grandmaster)
GT Sophy (Gran Turismo, beat best drivers)
Science:
AlphaFold (protein structure)
Plasma control (nuclear fusion, DeepMind)
Drug discovery (molecular optimization)
Industry:
Recommendation systems (YouTube, TikTok)
Data center cooling (Google, 40% savings)
Autonomous driving (Waymo, Tesla)
Sim-to-Real Transfer
Training RL in the real world is slow and dangerous. The solution: train in simulation (millions of episodes in hours), then transfer the policy to a real robot. Domain randomization (vary physics, textures, lighting in sim) makes policies robust to the reality gap.
The sample efficiency problem: RL still needs orders of magnitude more experience than humans. A human learns Atari in minutes; DQN needs millions of frames. Model-based RL (learn a world model, plan in imagination) and foundation models for RL are active research areas addressing this gap.