From Code to Steel: NVIDIA ENPIRE Lets AI Agents Conduct Autonomous Research in the Physical World

Friday, June 19, 2026

8 AI Coding Agents × 8 Real Robots = First Closed-Loop AutoResearch in the Physical World
On June 17-18, 2026, NVIDIA’s GEAR Lab, in collaboration with CMU and UC Berkeley, unveiled the ENPIRE project — a groundbreaking system where AI coding agents step out of the digital sandbox to autonomously control robotic arms for high-precision tasks like pin insertion, GPU installation, and zip-tie cutting, achieving a 99% final success rate.

1. Introduction: When AI Does More Than Write Code

In 2024, Andrej Karpathy open-sourced the autoresearch project, enabling AI to automatically manage model training and experiments. By 2025, AI Scientist systems could autonomously generate research proposals, run experiments, and write papers.

But these systems share a common limitation: they operate exclusively in digital environments. Code execution yields instant results, simulator physics are deterministic, and failures cost nothing to restart.

The real world is different.

Friction changes when robots collide. Object positions can’t be perfectly restored. Lighting conditions and sensor noise constantly fluctuate. A stark example from the ENPIRE paper: in simulation, all three tested Coding Agents successfully completed the Push-T task. But when the same methods were deployed on real robots, two of the three agents failed outright.

This is precisely why ENPIRE (Agentic Robot Policy Self-Improvement in the Real World) exists — to bring AI research into the uncertain, non-deterministic physical world for the first time.

2. Hardware Architecture: 8 Independent Research Stations

ENPIRE’s physical setup is impressively scaled:

8 experiment units, each operating independently
Each unit equipped with:
- 2× 6-DOF YAM robotic arms (for collaborative operations)
- 1× Intel RealSense depth camera (visual perception)
- 1× RTX 5090 workstation (32GB VRAM) (local training and inference)
All computation runs locally, no shared cluster dependency
Safety mechanisms: Hardware-level — motion limit cutoffs + torque-limited grippers; Software-level — frozen reward functions to prevent agent cheating

# Experiment Unit Configuration (Python)
class ExperimentUnit:
    """ENPIRE individual experiment unit configuration"""
    def __init__(self, unit_id: int, ip: str):
        self.unit_id = unit_id
        self.robot_arms = [
            YAMArm(f"{ip}:50051"),  # Arm A
            YAMArm(f"{ip}:50052"),  # Arm B
        ]
        self.camera = RealSenseCamera(f"{ip}:50053")
        self.workstation = GPUWorkstation(
            gpu_model="RTX 5090",
            vram_gb=32,
            local_mode=True
        )
        self.safety = SafetyController(
            joint_limit_deg=270,
            torque_limit_nm=5.0,
            reward_frozen=True
        )
    
    def reset_scene(self) -> bool:
        """Automatic scene reset"""
        self.robot_arms[0].move_to_home()
        self.robot_arms[1].move_to_home()
        return self.camera.verify_scene_ready()

3. The Four Core Modules: The EN-PI-R-E Closed Loop

ENPIRE’s name is its architecture — the four modules spell “ENPIRE”, forming a complete physical research closed loop:

EN - Environment Module: Automatic Scene Reset and Verification

Key insight: For many robot tasks, resetting the environment is easier than completing the task itself.

The EN module resets the experimental scene after each trial and validates it via a visual classifier. Agents use Code-as-Policy to write reset programs — in many cases, resetting is essentially a complex Pick-and-Place task.

# Environment Auto-Reset Example (Python)
class EnvironmentModule:
    """ENPIRE - Environment Module"""
    def __init__(self, unit: ExperimentUnit):
        self.unit = unit
        self.verifier = VisualVerifier(
            model_path="models/scene_classifier.onnx",
            confidence_threshold=0.95
        )
    
    def auto_reset(self, task_config: dict) -> bool:
        """Automatic scene reset with retry logic"""
        max_retries = 3
        for attempt in range(max_retries):
            # 1. Move all objects to initial positions
            for obj_pose in task_config["initial_poses"]:
                success = self.unit.robot_arms[0].pick_and_place(
                    target_pos=obj_pose["current"],
                    goal_pos=obj_pose["initial"],
                    gripper_force=2.0
                )
                if not success:
                    self._log_error(f"Object {obj_pose['id']} reset failed")
                    continue
            
            # 2. Home the arms
            self.unit.robot_arms[0].move_to_home()
            self.unit.robot_arms[1].move_to_home()
            
            # 3. Visual verification
            frame = self.unit.camera.capture_frame()
            verified, confidence = self.verifier.verify(frame)
            
            if verified and confidence > 0.95:
                return True
            
            self._log_warning(f"Reset attempt {attempt+1} failed, confidence={confidence:.3f}")
        
        return False

The PI module is ENPIRE’s “brain.” Agents experiment with different algorithmic paradigms here — from heuristic rules to behavior cloning to reinforcement learning. Key innovation: Agents don’t just tune hyperparameters; they fundamentally rewrite algorithms.

During the peg insertion task, one agent autonomously wrote a contact force safety controller that outperformed human-tuned RL parameters.

# Policy Improvement Example (Python)
class PolicyImprovementModule:
    """ENPIRE - Policy Improvement Module"""
    
    def __init__(self, agent: CodingAgent):
        self.agent = agent
        self.best_policy = None
        self.best_score = -float('inf')
        self.experiment_log = []
    
    def search_and_improve(self, 
                          task_name: str,
                          search_space: dict,
                          max_iterations: int = 10) -> dict:
        """Core policy search and refinement loop"""
        for iteration in range(max_iterations):
            # Agent reads papers, proposes hypothesis
            hypothesis = self.agent.brainstorm(
                task=task_name,
                context=self.experiment_log[-3:],
                search_prompt="Read related papers and propose a new approach"
            )
            
            # Agent autonomously modifies training code
            new_policy_code = self.agent.generate_policy(
                hypothesis=hypothesis,
                base_policy=self.best_policy
            )
            
            compiled = self._compile_and_deploy(new_policy_code)
            
            if compiled:
                rollout_results = self._run_rollout(new_policy_code)
                score = rollout_results["success_rate"]
                
                if score > self.best_score:
                    self.best_policy = new_policy_code
                    self.best_score = score
                    self._git_commit(f"Iter {iteration}: Improved to {score:.2%}")
                
                self.experiment_log.append({
                    "iteration": iteration,
                    "hypothesis": hypothesis,
                    "score": score,
                    "code_hash": hash(new_policy_code)
                })
        
        return {"best_policy": self.best_policy, "best_score": self.best_score}
    
    def _git_commit(self, message: str):
        """Git auto-commit for agent collaboration"""
        import subprocess
        subprocess.run(["git", "add", "."], capture_output=True)
        subprocess.run(["git", "commit", "-m", message], capture_output=True)

R - Rollout Module: Parallel Multi-Robot Evaluation

The R module deploys policies on real robots in parallel, collecting statistical data. The pass@8 standard means: success in just one out of eight attempts counts as a pass — reflecting the system’s pragmatic “good enough” philosophy.

# Multi-Robot Parallel Rollout (Python)
import asyncio
from dataclasses import dataclass
from typing import List

@dataclass
class RolloutResult:
    unit_id: int
    success: bool
    execution_time_s: float
    failure_reason: str = ""

class RolloutModule:
    """ENPIRE - Rollout Module"""
    
    def __init__(self, units: List[ExperimentUnit]):
        self.units = units
        self.pass_at_k = 8
    
    async def parallel_rollout(self, 
                             policy_code: str,
                             num_episodes: int = 10) -> dict:
        """Execute parallel policy evaluation across all robots"""
        tasks = []
        for unit in self.units:
            for episode in range(num_episodes // len(self.units)):
                tasks.append(
                    self._run_single_episode(unit, policy_code, episode)
                )
        
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        successes = [r for r in results if isinstance(r, RolloutResult) and r.success]
        failures = [r for r in results if isinstance(r, RolloutResult) and not r.success]
        
        return {
            "total_episodes": len(results),
            "success_count": len(successes),
            "failure_count": len(failures),
            "success_rate": len(successes) / len(results) * 100,
            "pass_at_8": len(successes) >= self.pass_at_k,
            "avg_execution_time_s": sum(
                r.execution_time_s for r in results 
                if isinstance(r, RolloutResult)
            ) / len(results),
            "failure_analysis": self._analyze_failures(failures)
        }
    
    async def _run_single_episode(self, 
                                 unit: ExperimentUnit,
                                 policy_code: str,
                                 episode_id: int) -> RolloutResult:
        try:
            await asyncio.get_event_loop().run_in_executor(
                None, unit.reset_scene
            )
            start_time = time.time()
            exec_result = await self._execute_policy(unit, policy_code)
            elapsed = time.time() - start_time
            verified = unit.camera.verify_task_complete()
            
            return RolloutResult(
                unit_id=unit.unit_id,
                success=verified,
                execution_time_s=elapsed,
                failure_reason=exec_result.get("error", "") if not verified else ""
            )
        except Exception as e:
            return RolloutResult(
                unit_id=unit.unit_id,
                success=False,
                execution_time_s=0,
                failure_reason=str(e)
            )

E - Evolution Module: Failure Analysis Driving Code Iteration

The E module is the most “human-like” part of ENPIRE. Agents analyze experiment logs, search for relevant papers online, propose new algorithmic hypotheses, modify code, and commit to Git. Key constraint: Agents cannot modify the reward function (frozen); they can only improve the policy itself.

// Evolution Module Core - Go Implementation
package evolution

import (
	"context"
	"fmt"
	"log"
	"sync"
	"time"
)

type ExperimentLog struct {
	UnitID      int       `json:"unit_id"`
	Timestamp   time.Time `json:"timestamp"`
	Success     bool      `json:"success"`
	ErrorMsg    string    `json:"error_msg,omitempty"`
	RewardValue float64   `json:"reward_value"`
	DurationMs  int64     `json:"duration_ms"`
}

type FailureAnalyzer struct {
	agent       *CodingAgent
	paperSearch *PaperSearcher
	logStore    []ExperimentLog
	mu          sync.Mutex
}

func (fa *FailureAnalyzer) AnalyzeAndIterate(ctx context.Context, logs []ExperimentLog) (*ImprovementPlan, error) {
	failureCh := make(chan FailurePattern, len(logs))
	var wg sync.WaitGroup
	
	for _, log := range logs {
		if !log.Success {
			wg.Add(1)
			go func(l ExperimentLog) {
				defer wg.Done()
				pattern := fa.analyzeSingleFailure(l)
				failureCh <- pattern
			}(log)
		}
	}
	
	go func() {
		wg.Wait()
		close(failureCh)
	}()
	
	patterns := make([]FailurePattern, 0)
	for p := range failureCh {
		patterns = append(patterns, p)
	}
	
	paperCh := make(chan PaperResult, len(patterns))
	var searchWg sync.WaitGroup
	
	for _, pattern := range patterns {
		searchWg.Add(1)
		go func(p FailurePattern) {
			defer searchWg.Done()
			papers, err := fa.paperSearch.Search(ctx, p.Keyword, 3)
			if err != nil {
				log.Printf("Paper search failed for %s: %v", p.Keyword, err)
				return
			}
			paperCh <- PaperResult{Pattern: p, Papers: papers}
		}(pattern)
	}
	
	go func() {
		searchWg.Wait()
		close(paperCh)
	}()
	
	plan := &ImprovementPlan{
		Timestamp:  time.Now(),
		Changes:    make([]CodeChange, 0),
		Hypothesis: "",
	}
	
	for pr := range paperCh {
		if len(pr.Papers) > 0 {
			change := fa.agent.ProposeChange(pr.Pattern, pr.Papers[0])
			plan.Changes = append(plan.Changes, change)
		}
	}
	
	plan.Hypothesis = fa.agent.SynthesizeHypothesis(patterns, plan.Changes)
	return plan, nil
}

type ImprovementPlan struct {
	Timestamp  time.Time    `json:"timestamp"`
	Changes    []CodeChange `json:"changes"`
	Hypothesis string       `json:"hypothesis"`
}

type CodeChange struct {
	File        string `json:"file"`
	Description string `json:"description"`
	Code        string `json:"code"`
}

func (plan *ImprovementPlan) ApplyToRepository(repo *GitRepository) error {
	for _, change := range plan.Changes {
		if err := repo.ApplyChange(change); err != nil {
			return fmt.Errorf("apply change failed: %w", err)
		}
	}
	return repo.Commit(plan.Hypothesis)
}

4. Three Coding Agents: A Comparative Analysis

ENPIRE tested three mainstream Coding Agents simultaneously:

Agent	Base Model	Simulator Performance	Real Robot Performance	Characteristics
OpenAI Codex	GPT-5.5	✅ 100% success	✅ Best overall	Fastest time to target success rate
Anthropic Claude Code	Opus 4.7	✅ 100% success	✅ Good	Higher code quality, slightly slower
Kimi Code	Kimi K2.6	✅ 100% success	⚠️ Needs more iterations	Strong architecture compatibility

Key discovery: On the Push-T benchmark, all three agents used no neural networks, no training data, no behavior cloning — they solved this task (which typically requires extensive human demonstration data) in under 2 hours using only heuristic rule-based methods they wrote themselves. Agents “think faster than they learn.”

// Agent Scheduling & Load Balancing - Go Concurrency Model
package scheduler

import (
	"context"
	"log"
	"sync"
	"time"
)

type AgentTask struct {
	ID          string                 `json:"id"`
	AgentType   string                 `json:"agent_type"`
	UnitID      int                    `json:"unit_id"`
	TaskType    string                 `json:"task_type"`
	Config      map[string]interface{} `json:"config"`
	CreatedAt   time.Time              `json:"created_at"`
	MaxDuration time.Duration          `json:"max_duration"`
}

type Scheduler struct {
	agents    map[string]*AgentHandle
	units     []*ExperimentUnit
	workQueue chan AgentTask
	results   sync.Map
}

func NewScheduler(agentConfigs map[string]AgentConfig, units []*ExperimentUnit) *Scheduler {
	s := &Scheduler{
		agents:    make(map[string]*AgentHandle),
		units:     units,
		workQueue: make(chan AgentTask, 100),
	}
	for agentType, config := range agentConfigs {
		for i := 0; i < config.Instances; i++ {
			handle := &AgentHandle{
				ID:        fmt.Sprintf("%s-%d", agentType, i),
				AgentType: agentType,
				Config:    config,
				Busy:      false,
			}
			s.agents[handle.ID] = handle
		}
	}
	return s
}

func (s *Scheduler) Start(ctx context.Context) {
	var wg sync.WaitGroup
	for i := 0; i < len(s.agents); i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for {
				select {
				case task := <-s.workQueue:
					s.executeTask(ctx, task)
				case <-ctx.Done():
					return
				}
			}
		}()
	}
	wg.Wait()
}

func (s *Scheduler) executeTask(ctx context.Context, task AgentTask) {
	agent := s.findAvailableAgent(task.AgentType)
	if agent == nil {
		log.Printf("No available agent for task %s", task.ID)
		return
	}
	agent.Busy = true
	defer func() { agent.Busy = false }()
	
	unit := s.units[task.UnitID]
	startTime := time.Now()
	result := agent.Run(ctx, unit, task)
	
	s.results.Store(task.ID, TaskResult{
		TaskID:     task.ID,
		AgentID:    agent.ID,
		UnitID:     task.UnitID,
		Success:    result.Success,
		Duration:   time.Since(startTime),
		TokenUsage: result.TokenUsage,
	})
	
	if result.Success {
		gitShare(agent.ID, task.ID, result.PolicyCode)
	}
}

5. Physical Scaling Law: Robot Count as a New Scaling Resource

One of ENPIRE’s most remarkable findings is the Physical Scaling Law — demonstrating that increasing parallel robot units accelerates research progress.

Experimental Results

Peg insertion task performance across configurations:

Robot Count	Completion Time	Speedup	Parallel Mechanism
1 robot	>1.5 hours	1.0x (baseline)	Single path
4 robots	~50 min	~1.8x	Multi-path exploration
8 robots	~40 min	~2.25x	Full parallel + best selection

Core Mechanism

Multiple Coding Agents explore different approaches simultaneously:

Some try new RL algorithms
Others modify reward functions
Others optimize the training infrastructure

When one direction proves effective, other agents automatically clone, merge, and reuse those results via Git; ineffective approaches are quickly discarded.

The Bottleneck

ENPIRE also revealed a counter-intuitive limitation: MRU (Mean Robot Utilization) remained below 50% — robots spent half their time idle, waiting for agents to figure out what to do next.

The real bottleneck isn’t robot hardware — it’s the agent’s thinking speed.

// Physical Scaling Law Simulation - Go Concurrency Model
package scaling

import (
	"fmt"
	"math"
	"sync"
	"time"
)

type ScalingExperiment struct {
	NumRobots      int
	NumAgents      int
	AgentThinkTime time.Duration
	RobotExecTime  time.Duration
	CommOverhead   time.Duration
}

type SimulateResult struct {
	NumRobots        int
	TotalTime        time.Duration
	Speedup          float64
	MRU              float64
	TokenConsumption int64
}

func SimulatePhysicalScaling(configs []ScalingExperiment) []SimulateResult {
	results := make([]SimulateResult, len(configs))
	var wg sync.WaitGroup
	for i, config := range configs {
		wg.Add(1)
		go func(idx int, cfg ScalingExperiment) {
			defer wg.Done()
			results[idx] = runSimulation(cfg)
		}(i, config)
	}
	wg.Wait()
	return results
}

func runSimulation(cfg ScalingExperiment) SimulateResult {
	baseTime := 90.0 * float64(time.Minute)
	alpha := 0.15
	N := float64(cfg.NumRobots)
	actualSpeedup := N / (1 + alpha*(N-1))
	totalTime := baseTime / actualSpeedup
	
	totalCycleTime := float64(cfg.RobotExecTime) + 
		float64(cfg.AgentThinkTime) + 
		float64(cfg.CommOverhead)*(N-1)
	mru := float64(cfg.RobotExecTime) / totalCycleTime
	
	tokenBase := int64(100000)
	tokenConsumption := tokenBase * int64(N) * int64(1+int(0.3*(N-1)))
	
	return SimulateResult{
		NumRobots:        cfg.NumRobots,
		TotalTime:        time.Duration(totalTime),
		Speedup:          actualSpeedup,
		MRU:              math.Min(mru, 1.0),
		TokenConsumption: tokenConsumption,
	}
}

func RunBenchmark() {
	configs := []ScalingExperiment{
		{NumRobots: 1, NumAgents: 1, AgentThinkTime: 30 * time.Second, 
		 RobotExecTime: 25 * time.Second, CommOverhead: 5 * time.Second},
		{NumRobots: 4, NumAgents: 4, AgentThinkTime: 35 * time.Second, 
		 RobotExecTime: 25 * time.Second, CommOverhead: 8 * time.Second},
		{NumRobots: 8, NumAgents: 8, AgentThinkTime: 40 * time.Second, 
		 RobotExecTime: 25 * time.Second, CommOverhead: 12 * time.Second},
	}
	
	results := SimulatePhysicalScaling(configs)
	fmt.Println("=== Physical Scaling Law Benchmark ===")
	fmt.Printf("%-12s %-16s %-12s %-12s %-16s\n", 
		"Robots", "Total Time", "Speedup", "MRU", "Token Cons.")
	for _, r := range results {
		fmt.Printf("%-12d %-16s %-12.2f %-12.2f %-16d\n",
			r.NumRobots, r.TotalTime.Round(time.Second), r.Speedup, r.MRU, r.TokenConsumption)
	}
}

6. Real-World Robot Tasks and Results

Four Benchmark Tasks

Task	Description	Precision	Agent’s Autonomous Strategy
Push-T	Push T-shaped block to target	~5mm	Heuristic force control
Peg Insertion	Insert pin into 4mm hole	4mm	Contact force safety controller
GPU Insertion	Insert GPU into motherboard slot	High precision	Impedance control + visual guidance
Zip-Tie Cut	Cut plastic zip-tie with scissors	Dexterous	Dual-arm collaboration

Experimental Results

Final success rate: 99% across all tasks (pass@8 standard)
Push-T: 8-agent team reached 99% in just 2 hours; single agent took nearly 5 hours
Peg Insertion: Agent’s self-written contact force controller outperformed human-tuned parameters
Knowledge transfer: Agents summarized peg insertion experience as Markdown docs; appending these to GPU insertion task descriptions significantly accelerated learning

# Robot Control Interface Example (Python)
import numpy as np
from typing import Tuple

class RobotControlInterface:
    """ENPIRE robot control API - directly callable by agents"""
    
    def __init__(self, grpc_endpoint: str):
        self.endpoint = grpc_endpoint
        self._connect()
    
    def move_to_pose(self, x, y, z, roll, pitch, yaw, speed=0.1) -> bool:
        return self._call_grpc("MoveToPose", {
            "pose": [x, y, z, roll, pitch, yaw], "speed": speed
        })
    
    def grasp(self, force: float = 2.0) -> bool:
        return self._call_grpc("Grasp", {"force": force})
    
    def release(self) -> bool:
        return self._call_grpc("Release", {})
    
    def apply_force(self, force_vector: Tuple[float, float, float], duration: float = 1.0) -> bool:
        return self._call_grpc("ApplyForce", {
            "force": list(force_vector), "duration": duration
        })
    
    def get_joint_states(self) -> np.ndarray:
        response = self._call_grpc("GetJointStates", {})
        return np.array(response["joint_angles"])
    
    def _call_grpc(self, method: str, params: dict) -> dict:
        return {"status": "ok", "data": {}}

7. Safety: The Bottom Line for Physical World Research

Automated research in the physical world carries real risks — one failure can damage hardware or even cause injury. ENPIRE implements a two-layer hardware failsafe + one software freeze:

Hardware Safety Layer

Hard motion limit cutoff: Power cut when joints exceed safe range
Torque-limited grippers: Mechanical force capped at safe thresholds

Software Safety Layer

Frozen reward functions: Fixed offline via visual classifiers, preventing agents from “grading their own homework”

class SafetyController:
    """ENPIRE dual-layer safety system"""
    def __init__(self, joint_soft_limit_deg=270.0, joint_hard_limit_deg=300.0,
                 torque_limit_nm=5.0, force_limit_n=20.0):
        self.soft_limit = joint_soft_limit_deg
        self.hard_limit = joint_hard_limit_deg
        self.emergency_stopped = False
    
    def check_motion_safety(self, target_joints: np.ndarray) -> bool:
        if self.emergency_stopped:
            return False
        for i, angle in enumerate(np.degrees(target_joints)):
            if abs(angle) > self.soft_limit:
                if abs(angle) > self.hard_limit:
                    self.emergency_stop(f"Joint {i} exceeded hard limit: {angle:.1f}deg")
                    return False
        return True
    
    def emergency_stop(self, reason: str):
        self.emergency_stopped = True
        self._cut_power()
        print(f"[SAFETY] EMERGENCY STOP: {reason}")
    
    def _cut_power(self):
        pass

8. Open Source Ecosystem & Future Outlook

Open Source Plans

All ENPIRE code and systems will be open-sourced. A “consumer edition” based on the LeRobot SO-101 kit + NVIDIA Jetson Thor is also in development, enabling developers to build similar autonomous robot research systems at home.

Jim Fan’s Perspective

“AutoResearch enters the physical world. All we did was provide Codex an API to the atomic world — everything else is emergence.”
“The goal is for the entire team to go on vacation, and even Jensen won’t notice the lab is still running.”

Future Directions

Agent thinking acceleration: Current robot utilization <50%; improving agent inference speed is the immediate bottleneck
Larger-scale validation: Where is the scaling boundary at 16, 32, or more robots?
Multi-task transfer: Knowledge transfer validated from peg insertion to GPU insertion — can it extend to completely different task domains?
Human role evolution: From “operator” to “report reviewer” — human researchers’ work becomes designing environments where agents can conduct research autonomously

9. Summary & Technical Insights

ENPIRE’s core contribution isn’t a single algorithmic breakthrough — it’s an infrastructure for AI agents to conduct autonomous research in the physical world:

Architectural innovation: EN-PI-R-E four-module closed loop, transforming real-world robot learning into an agent-managed, controlled optimization process
Physical Scaling Law: First verification in robotics that increasing parallel experiment units accelerates research
Agent autonomy: Full-chain autonomy from reading papers, proposing hypotheses, writing code, running experiments, to analyzing results
Code as policy: Agents drive experiments through programming languages, vastly expanding the exploration space
Two-layer safety baseline: Establishing safety standards for physical-world AutoResearch

In the second half of robotics research, humans step back from “operators” to “report reviewers” — ENPIRE, with 8 AI Agents, 8 robots, and dual-layer safety locks, has moved “automated research” from the code sandbox into the physical world.

References: NVIDIA GEAR Lab, “ENPIRE: Agentic Robot Policy Self-Improvement in the Real World”, 2026. Project page: https://research.nvidia.com/labs/gear/enpire/

From Code to Steel: NVIDIA ENPIRE Lets AI Agents Conduct Autonomous Research in the Physical World

1. Introduction: When AI Does More Than Write Code

2. Hardware Architecture: 8 Independent Research Stations

3. The Four Core Modules: The EN-PI-R-E Closed Loop

EN - Environment Module: Automatic Scene Reset and Verification

PI - Policy Improvement Module: Continuous Strategy Refinement

R - Rollout Module: Parallel Multi-Robot Evaluation

E - Evolution Module: Failure Analysis Driving Code Iteration

4. Three Coding Agents: A Comparative Analysis

5. Physical Scaling Law: Robot Count as a New Scaling Resource

Experimental Results

Core Mechanism

The Bottleneck

6. Real-World Robot Tasks and Results

Four Benchmark Tasks

Experimental Results

7. Safety: The Bottom Line for Physical World Research

Hardware Safety Layer

Software Safety Layer

8. Open Source Ecosystem & Future Outlook

Open Source Plans

Jim Fan’s Perspective

Future Directions

9. Summary & Technical Insights