Anonymous submission

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

VASO closes the loop between formal verification and LLM-generated robot skills: verifier counterexamples become textual optimization feedback for reusable skill contracts, improving safety without fine-tuning model weights.

View method Watch videos Code availability

Abstract

Trustworthy Skill Evolution

Reusable robot skills are becoming the basic units through which embodied agents turn open-ended instructions into long-horizon physical behavior. While foundation models have lowered the cost of creating these skills, the cost of trusting them remains high. Existing skill-evolution loops refine skills through execution feedback, unit tests, environment reward, or LLM self-critique, but these signals only provide trace-level evidence.

VASO represents each skill as a semantic contract with a formal interface for model checking and a planner-facing interface for executable behavior generation. When verification fails, VASO translates the counterexample trace into a textual gradient that updates the reusable skill contract while keeping foundation-model weights frozen.

Method

Verification-Guided Automated Skill Optimization

Overview of the VASO skill generation and verification pipeline. — A task prompt yields a reusable skill, skill-level verification filters logically inconsistent contracts, plan-level verification checks induced behavior, and counterexamples refine the skill contract.

Generate

A foundation model proposes a reusable skill with a local rule and semantic contract.

Verify

A model checker evaluates both skill-level feasibility and plan-level safety.

Refine

Counterexample traces become textual gradients that update the skill contract.

Real-Robot Videos

Ground and Aerial Platforms

Before optimization. Execution derived from the initial skill violates the specification: "stop when a person is ahead." Download video

After VASO. The optimized execution satisfies the specification: "stop when a person is ahead." Download video

Clearpath Jackal. Skill-induced behavior is checked against temporal safety specifications. Download video

PX4 quadcopter. Velocity trajectories are refined through verifier feedback before execution. Download video

Execution Figures

Running Examples

Clearpath Jackal. Real-world executions with multiple verifiable self-evolving skills.

PX4 quadcopter. A self-evolving skill and velocity-trajectory execution example.

Results

High Safety With Lightweight Optimization

97.2% formal-specification compliance

<100 optimization samples per skill

11 temporal-logic specifications

400 generated plans evaluated

Comparison of safety score, training time, and sample usage across methods. — Safety and efficiency comparison against representative baselines.

Seen and unseen task performance comparison. — Optimized skills generalize to unseen prompts from the same task families.

System

Semantic Contracts as the Optimization Target

Resources

Paper, Code, and Citation

Paper

The anonymous paper PDF will be linked here when the submission package is ready.

Code

We will make the code public after acceptance due to the anonymity policy.

BibTeX

Citation information will be added after publication details are available.