Anonymous submission

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

VASO closes the loop between formal verification and LLM-generated robot skills: verifier counterexamples become textual optimization feedback for reusable skill contracts, improving safety without fine-tuning model weights.

Abstract

Trustworthy Skill Evolution

Reusable robot skills are becoming the basic units through which embodied agents turn open-ended instructions into long-horizon physical behavior. While foundation models have lowered the cost of creating these skills, the cost of trusting them remains high. Existing skill-evolution loops refine skills through execution feedback, unit tests, environment reward, or LLM self-critique, but these signals only provide trace-level evidence.

VASO represents each skill as a semantic contract with a formal interface for model checking and a planner-facing interface for executable behavior generation. When verification fails, VASO translates the counterexample trace into a textual gradient that updates the reusable skill contract while keeping foundation-model weights frozen.

Method

Verification-Guided Automated Skill Optimization

Overview of the VASO skill generation and verification pipeline.
A task prompt yields a reusable skill, skill-level verification filters logically inconsistent contracts, plan-level verification checks induced behavior, and counterexamples refine the skill contract.
01

Generate

A foundation model proposes a reusable skill with a local rule and semantic contract.

02

Verify

A model checker evaluates both skill-level feasibility and plan-level safety.

03

Refine

Counterexample traces become textual gradients that update the skill contract.

Real-Robot Videos

Ground and Aerial Platforms

Clearpath Jackal. Skill-induced behavior is checked against temporal safety specifications.
PX4 quadcopter. Velocity trajectories are refined through verifier feedback before execution.

Execution Figures

Running Examples

Open Jackal execution examples
Clearpath Jackal. Real-world executions with multiple verifiable self-evolving skills.
Open PX4 execution example
PX4 quadcopter. A self-evolving skill and velocity-trajectory execution example.

Results

High Safety With Lightweight Optimization

97.2% formal-specification compliance
<100 optimization samples per skill
11 temporal-logic specifications
400 generated plans evaluated
Comparison of safety score, training time, and sample usage across methods.
Safety and efficiency comparison against representative baselines.
Seen and unseen task performance comparison.
Optimized skills generalize to unseen prompts from the same task families.

System

Semantic Contracts as the Optimization Target

Detailed VASO system diagram showing plan verification and semantic contract refinement.
VASO optimizes the skill representation itself rather than a one-off plan, a flat planner prompt, or hidden model parameters.

Resources

Paper, Code, and Citation

Paper

The anonymous paper PDF will be linked here when the submission package is ready.

Code

We will make the code public after acceptance due to the anonymity policy.

BibTeX

Citation information will be added after publication details are available.