Pixels to Proofs: Safe Latent World-Model Control

SLS-squared plans motion directly from camera images using a learned latent world model, then makes the plan provably safe by combining conformal prediction with GPU-accelerated system level synthesis robust MPC.

Two powerful ideas in robotics have been on a collision course. The first is learned world models: train a neural network to predict how the world evolves in a compact latent space, and you can plan actions by imagining their consequences, optimizing a trajectory through the model's learned dynamics rather than through hand-derived physics. World models have driven striking results in control from raw pixels, because they let a robot plan in a representation it learned itself rather than one an engineer specified. The second idea is safety certification: control theory has mature tools for guaranteeing that a system respects constraints despite disturbances, robust model predictive control chief among them. The trouble is that these tools assume you have a trustworthy dynamics model with bounded error, and a learned latent world model is precisely the opposite, a black box whose prediction errors are unknown and potentially large.

A new paper, Pixels to Proofs: Probabilistically-Safe Latent World Model Control via Parallel Conformal Robust MPC (arXiv:2606.15594), by Devesh Nath, Anutam Srinivasan, Haoran Yin, Ruitong Jiang, Jeffrey Fang, and Glen Chou, tackles the marriage directly. Their framework, called SLS-squared, plans from pixels using a learned latent world model but refuses to trust that model naively, instead surrounding it with machinery that quantifies how wrong it might be and plans robustly against that uncertainty. The title's pun captures the ambition: go from pixels, raw image observations, all the way to proofs, probabilistic safety guarantees on the real system.

“We present SLS^2, a framework for safe feedback motion planning from pixels using robust model predictive control (MPC) in learned latent world models.”— arXiv:2606.15594 source

The world model and why it is built to be optimized through

The foundation is an action-conditioned joint-embedding world model with compact Markovian latent states. Two design choices matter here. Action-conditioned means the model predicts how the latent state changes as a function of the action taken, exactly what you need to plan, since planning is choosing actions to steer future states. Compact and Markovian means the latent state is small and self-sufficient: the next state depends only on the current latent state and action, not on a long history. That Markovian property is what makes the model amenable to efficient gradient-based trajectory optimization, you can differentiate through the learned latent dynamics to compute how a whole trajectory of actions shapes the future, and descend toward trajectories that reach the goal. In short, the world model is deliberately shaped to be something a planner can optimize through smoothly.

The hard part: enforcing safety for the true system, not the model

Here is the crux. The planner optimizes inside the latent world model, but the robot lives in the real world, and the model's predictions are imperfect. A trajectory that looks safe in the latent imagination may be unsafe in reality if the model's error is large in the wrong place. Bridging that gap is the paper's central technical move, and it has two pillars.

The first pillar is conformal prediction. Conformal prediction is a statistical method that turns a model's raw outputs into calibrated error bounds with a chosen confidence level, using held-out calibration data and making only mild assumptions. SLS-squared uses it to obtain calibrated latent error bounds, a principled, data-driven quantification of how far the latent prediction can deviate from reality, and from those bounds it constructs robust latent-space constraint sets. This is the key translation: the world model's untrustworthiness becomes a concrete, calibrated uncertainty set the planner can reason about, rather than an unbounded unknown.

The second pillar is robust control via system level synthesis (SLS). SLS is a modern framework for designing controllers that are robust to bounded disturbances, and the paper uses a GPU-accelerated SLS robust MPC scheme informed by those conformal error bounds. The robust MPC plans not just a nominal trajectory but one that stays safe across the entire calibrated uncertainty set, so that even when the latent prediction is off by as much as the conformal bound allows, the constraint is still respected. The GPU acceleration, and the parallel structure flagged in the title, are what make this computationally feasible: robustifying against an uncertainty set is far more expensive than nominal planning, and parallel computation on the GPU is what keeps it tractable for closed-loop use.

On top of these, the authors learn and conformalize a latent constraint checker, a learned function that judges whether a latent state satisfies the task's constraints, then calibrated so its judgments come with probabilistic guarantees. This lets the SLS planner impose probabilistic safety constraints during closed-loop execution, checking safety in latent space with a calibrated confidence rather than hoping the raw learned checker is correct. The result is a pipeline in which every learned component, the dynamics, the constraint checker, has its uncertainty quantified and accounted for before it is trusted to keep the robot safe.

What the results show

The method is evaluated on vision-based control tasks, the regime it is designed for, where the robot must plan from images and respect safety constraints. The reported outcome is that SLS-squared improves both goal-reaching performance and safety over two relevant baseline families: latent world-model methods, which plan from pixels but lack the safety machinery, and safe-planning methods. Improving on both axes simultaneously is the meaningful claim, because the usual tension is that adding safety conservatism degrades performance, the robot becomes timid and reaches goals less reliably. Beating world-model baselines on goal-reaching while also beating safe-planning baselines on safety suggests the conformal-robust combination buys safety without forcing the planner into excessive caution, plausibly because the conformal bounds are calibrated to the model's actual error rather than assuming a loose worst case.

Why it matters

The significance is methodological and lands on one of the genuine open problems in learning-based control: how to get the expressive power of learned, pixel-driven world models without surrendering the safety guarantees that classical control provides. Pure world-model planning is powerful but unaccountable; pure robust control is trustworthy but needs a model it rarely has from raw images. SLS-squared is a concrete recipe for having both, and its conceptual backbone, use conformal prediction to convert a learned model's uncertainty into calibrated bounds, then feed those bounds into a robust MPC that plans against them, is general enough to outlast the specific architecture. That pattern is a reusable bridge between the statistical-learning and the control-theory cultures, which have long talked past each other.

The appropriate caution is to read the guarantee precisely. These are probabilistic safety guarantees, valid at a chosen confidence level and resting on conformal prediction's assumptions, most importantly that the deployment conditions resemble the calibration data; a sufficiently novel situation can move the true error outside the calibrated set. The evaluation is on vision-based control tasks rather than fielded hardware in the wild, and as with any conformal method the guarantee is a coverage statement, not an absolute certificate. But that is the right kind of honesty for the problem. By making the learned model's fallibility explicit and planning robustly against a calibrated version of it, the work points learning-based robot control toward a future where planning from pixels and proving safety are no longer mutually exclusive.

From Pixels to Proofs: Wrapping a Learned World Model in Robust MPC and Conformal Safety Bounds

The world model and why it is built to be optimized through

The hard part: enforcing safety for the true system, not the model

What the results show

Why it matters

Comments