Research
PulseVLA-LIBERO: Our First Competitive Open-Weight Robot Policy, Trained on a Home GPU
Today Verapulse is releasing its first open-weight model: PulseVLA-LIBERO-0.5B.
PulseVLA-LIBERO is a Vision-Language-Action (VLA) policy for robot manipulation. This is the kind of model that turns a camera feed and a plain-language-instruction into robot motor commands.
The headline is simple:
We trained a competitive VLA policy on a single NVIDIA RTX 5090 in a home office. No cloud GPU cluster. No multi-node setup. No cloud training bill. The full run took about 7.5 hours and cost well under $10 in electricity.
PulseVLA-LIBERO is our first open-weight model, reaching 81.8% average success on LIBERO under the public evaluation protocol. It is the best performing sub-1B parameter model to date.
The weights, inference code, evaluation script, configuration, tokenizer, and normalization statistics are public under Apache-2.0.
This is our first public proof point for the thesis behind Verapulse: competitive models for robots should be cheaper to train, faster to adapt, and practical for vertical problems.
Why this matters
Robotics foundation models are often associated with massive infrastructure: large datasets, multi-GPU clusters, long training runs, cloud budgets, and complex distributed systems.
That infrastructure will matter for some problems. But it should not be the default starting point for every robotics company, research team, or vertical automation problem.
Many robotics companies do not need a giant generalist model that tries to solve every task in every environment. They need a policy that works reliably for their robot, their environment, and their task distribution.
That changes the question.
Instead of asking only, “How large can the model be?” we should also ask:
How quickly and cheaply can we adapt a capable robot policy to a specific domain?
PulseVLA-LIBERO is our first public answer. It shows that serious VLA training can fit inside a very different economic envelope: one consumer GPU, one afternoon-scale run, and a small electricity bill.
That matters because lower training cost changes the iteration loop. It moves the bottleneck away from raw compute access and toward the things that actually determine whether a robot policy improves:
- data quality,
- evaluation discipline,
- task-specific adaptation,
- and iteration speed.
That is where Verapulse is focused.
What we built
PulseVLA-LIBERO follows a SmolVLA-style design. The model has two main components:
- A frozen pretrained vision-language backbone based on SmolVLM2-500M-Instruct.
- A trainable flow-matching action expert that learns to produce robot action chunks.
The backbone provides visual and language understanding. The action expert learns how to act.
The key design choice is that we do not train the full vision-language model. We keep the pretrained backbone frozen and train the action expert and related projections.
That gives the released model about 557.6M total parameters, with about 97.5M trainable parameters.
In plain terms:
Camera views + language instruction + robot state
↓
Frozen pretrained vision-language backbone
↓
Trainable flow-matching action expert
↓
Robot action chunk
Because the backbone is frozen, it does not need to be updated during training. That keeps memory use low enough for a single consumer GPU. In practice, the run fits in about 16 GB of VRAM at batch size 32 using bf16.
The result is a VLA policy trained for LIBERO manipulation without cloud GPUs, without a cluster, and without robotics pretraining for the action expert.
Results
We report success rate on all four LIBERO manipulation suites. The public table below uses the evaluation protocol documented in the model card: 10 tasks × 10 episodes per suite, 400 total rollouts, n_action_steps=10, n_envs=1, and --seed 1.
The metric is pc_success, the percentage of episodes solved.
| LIBERO suite | PulseVLA-LIBERO success |
|---|---|
| LIBERO Object | 97.0% |
| LIBERO Goal | 88.0% |
| LIBERO Spatial | 82.0% |
| LIBERO 10 | 60.0% |
| Average | 81.8% |
These are the numbers published in the public Hugging Face model card.
Same class public checkpoint comparison
We also compared PulseVLA-LIBERO against the comparable public HuggingFaceVLA/smolvla_libero checkpoint on the suites below.
| LIBERO suite | PulseVLA-LIBERO | Public smolvla_libero |
|---|---|---|
| LIBERO Spatial | 82.0% | 56.0% |
| LIBERO Object | 97.0% | 81.0% |
| LIBERO Goal | 88.0% | 76.0% |
The comparison we care about most is not a vague “state of the art” claim. It is a reproducible comparison against a public same-class checkpoint under the same evaluation setup.
The published SmolVLA work reports stronger numbers under its own recipe, including large-scale robotics pretraining before LIBERO fine-tuning. PulseVLA-LIBERO is different: its action expert is trained from scratch on LIBERO while the pretrained vision-language backbone remains frozen.
That distinction matters.
Our goal with this release is not to claim that one benchmark is solved. It is to show that Verapulse can train, package, evaluate, and release competitive VLA models with a level of cost discipline that changes the iteration loop.
The cost story
The cost claim is not a stunt. It falls directly out of the design.
Because the vision-language backbone is frozen, it stays out of the backward graph. There are no gradients and no stored training activations for its layers. Only the action expert and related projections are trained.
The practical consequences:
- The run fits in about 16 GB of VRAM at batch size 32 using bf16.
- The full training run took about 7.5 hours on one NVIDIA RTX 5090.
- The energy usage was only a few kilowatt-hours, which is well under $10 in electricity at typical home rates.
- There was no cloud GPU rental, no multi-GPU orchestration, and no cluster.
The point is not that hardware is free. The RTX 5090 still has to exist somewhere.
The point is that once a single consumer GPU is available, a VLA training run can become an afternoon experiment rather than a major infrastructure decision.
That changes the economics of robotics model development.
When training cost drops this far, iteration speed becomes a product advantage.
Try it yourself
We did not just publish a number. We published the means to check it.
The Hugging Face repo includes the weights, config, tokenizer, normalization stats, minimal inference code, and a standalone LIBERO evaluation script.
pip install -U "huggingface_hub[cli]"
hf download verapulse/pulsevla-libero-0.5b --local-dir ./pulsevla-libero
cd ./pulsevla-libero
# inference deps + the public LIBERO simulator
pip install -r requirements.txt
pip install "lerobot[libero] @ git+https://github.com/huggingface/lerobot.git@d1b1c5c8cff5e1f637495e1667a1d6c7c5258f3b"
# reproduce the benchmark, one suite at a time
MUJOCO_GL=egl python eval_libero.py --task libero_object --seed 1
MUJOCO_GL=egl python eval_libero.py --task libero_goal --seed 1
MUJOCO_GL=egl python eval_libero.py --task libero_spatial --seed 1
MUJOCO_GL=egl python eval_libero.py --task libero_10 --seed 1
The inference path is intentionally lightweight: minimal pure-PyTorch code, with no transformers dependency at runtime.
What is open
We are being deliberate about this release.
Open under Apache-2.0:
- model weights,
- config,
- tokenizer,
- normalization statistics,
- minimal inference code,
- LIBERO evaluation script,
- and reproduction instructions.
We want researchers and builders to be able to download the model, inspect it, run it, evaluate it, and build on top of it.
The full Verapulse training stack remains private: the training framework, sweep tooling, orchestration, experiment system, and internal optimization pipeline are not part of this release.
The weights are our contribution to the community. The system that makes models like this cheap to produce is what Verapulse is building.
Open weights. Private training stack. Fast VLA iteration.
Where we are going
Verapulse trained and released a competitive VLA model on consumer hardware, with public weights and reproducible evaluation instructions.
For a first release, that is the signal we wanted to send.
It is not the end goal.
Our broader thesis is that robotics needs efficient, domain-adapted models: policies that can be trained and refined for specific customers, robots, and operational environments without requiring frontier-lab infrastructure.
Vertical robotics will not be won by compute alone. It will be won by fast iteration, disciplined evaluation, and models that adapt cheaply to the task at hand.
This release is our first public step in that direction.
If you are building robots for a specific task, or researching robotics models, we would love to talk.
Weights are public here:
https://huggingface.co/verapulse/pulsevla-libero-0.5b
Honest context
We hold ourselves to the standard our research audience will: here is what these numbers do and do not mean.
First, this is a simulation benchmark result, not a claim of real-robot deployment readiness. PulseVLA-LIBERO is evaluated on LIBERO, and performance outside that task distribution is unknown.
Second, LIBERO evaluation is protocol-sensitive. Seeds, batching, simulator settings, and policy stochasticity can move success rates by a few points. That is why the model card publishes the exact evaluation script, seed, rollout count, n_action_steps, n_envs, and dependency instructions behind the table.
Third, fixed-seed GPU evaluation is approximately reproducible, not guaranteed bit-exact. In clean-room checks, most suites reproduced exactly, while one suite moved by a few episodes. That kind of run-to-run wobble is expected for this class of stochastic policy and simulator setup.
We would rather publish a number with the exact reproduction path than a bigger number that is hard to check.
Attribution
PulseVLA-LIBERO builds on SmolVLM2 and the SmolVLA architecture from Hugging Face, the LeRobot library, and the LIBERO benchmark. Our thanks to those teams. See the model card for full attribution and reproduction details.