Compact Merging of Vision-Language-Action Models: Shared Cores and Spectral Residuals

Overview

Abstract

Forgetting in baselines and storage-performance trade-off

Vision-Language-Action (VLA) policies are increasingly adapted into task-specific experts as robots encounter new tasks, objects, and environments after deployment. Integrating these experts post hoc is challenging: retaining all checkpoints is storage-intensive, retraining a unified policy requires data access, and single compact merges can destroy specialized action behavior.

We study continual post-hoc VLA expert merging, where experts arrive sequentially and must be admitted without access to past robot data, joint retraining, or architecture redesign. We propose CARVE, which maintains a skill-preserving merge state consisting of an evolving global core and compact skill-local spectral residuals.

Concordant update directions are accumulated into the shared core, while conflicting or skill-specific directions are preserved as low-rank residuals. On OpenVLA experts, CARVE achieves 73.1% average success rate, recovering 95.6% of the full expert bank performance with only 51% of its storage. On MergeVLA experts, CARVE nearly matches the full expert bank (96.3% vs. 96.7%) using only 46% of storage.

Technical Approach

Method: CARVE

CARVE separates what can be safely shared across the expert stream from what should remain skill-local, enabling storage-efficient continual admission without joint retraining or full checkpoint retention.

A · New Task Arrival

Task Vector Computation

For each incoming expert checkpoint, a task vector is computed as the difference between the fine-tuned expert and the pretrained base model, capturing all adaptation information.

B · Expert Admission

Core & Residual Split

Update directions that agree in sign with the accumulated shared state are admitted into the global core. Disagreeing or skill-specific directions are routed to a local residual, which is then compressed via truncated SVD into compact low-rank factors.

C · Merge State

Compact Storage

The final merge state stores one shared global core plus a small pair of low-rank matrices per admitted skill. The original expert checkpoint is discarded after admission, keeping storage well below the full expert bank.

D · Inference

Skill Instantiation

At test time, the target skill is instantiated by combining the pretrained base, the final global core, and the skill's local residual. When no explicit task ID is available, a lightweight text-embedding similarity router selects the correct residual.

Global Core

The global core accumulates update directions that are consistently agreed upon across all admitted experts. Using a coordinate-wise sign-concordance rule with a decreasing update rate, it captures shared adaptation structure without overwriting skill-specific behavior. Because the final core is used at inference for all skills, later experts can contribute reusable improvements that benefit earlier skills — enabling positive backward transfer.

Skill-Local Spectral Residuals

The portion of each task vector not absorbed by the global core is compressed using truncated singular value decomposition and stored as two compact low-rank factor matrices. By the Eckart–Young–Mirsky theorem, this representation is optimal under a fixed rank budget. The rank parameter directly controls the storage-versus-fidelity trade-off, with most performance gains saturating by rank 64.

Why This Decomposition?

Unlike compressing each expert independently, CARVE computes residuals relative to the evolving shared core. This means skill-local corrections only need to encode what is genuinely task-specific, substantially reducing the residual energy that must be stored per skill and widening the storage advantage as more experts are admitted.

Single Global Merge

O(P)

✗ No skill-specific correction

CARVE (Ours)

O(P) + O(Σ r·(m+n))

✓ Skill-specific · Data-free · Online

Full Expert Bank

O(T · P)

✓ Skill-specific — but grows linearly

Qualitative Results

CARVE — LIBERO Task Suites

CARVE successfully executes manipulation tasks across four LIBERO task suites after continual expert merging.

LIBERO-Spatial

✓ Task 1

"Pick up the black bowl between the plate and the ramekin..."

✓ Task 2

"Pick up the black bowl next to the ramekin..."

✓ Task 3

"Pick up the black bowl from table center..."

✓ Task 4

"Pick up the black bowl on the cookie box..."

✓ Task 5

"Pick up the black bowl in the top drawer..."

✓ Task 6

"Pick up the black bowl on the ramekin..."

✓ Task 7

"Pick up the black bowl next to the cookie box..."

✓ Task 8

"Pick up the black bowl on the stove..."

✓ Task 9

"Pick up the black bowl next to the plate..."

✓ Task 10

"Pick up the black bowl on the wooden cabinet..."

LIBERO-Object

✓ Task 1

"Pick up the alphabet soup..."

✓ Task 2

"Pick up the cream cheese..."

✓ Task 3

"Pick up the salad dressing..."

✓ Task 4

"Pick up the bbq sauce..."

✓ Task 5

"Pick up the ketchup..."

✓ Task 6

"Pick up the tomato sauce..."

✓ Task 7

"Pick up the butter..."

✓ Task 8

"Pick up the milk..."

✓ Task 9

"Pick up the chocolate pudding..."

✓ Task 10

"Pick up the orange juice..."

LIBERO-Goal

✓ Task 1

"Open the middle drawer of the cabinet"

✓ Task 2

"Put the bowl on the stove"

✓ Task 3

"Put the wine bottle on top of the cabinet"

✓ Task 4

"Open the top drawer and put the bowl inside"

✓ Task 5

"Put the bowl on top of the cabinet"

✓ Task 6

"Push the plate to the front of the stove"

✓ Task 7

"Put the cream cheese in the bowl"

✓ Task 8

"Turn on the stove"

✓ Task 9

"Put the bowl on the plate"

✓ Task 10

"Put the wine bottle on the rack"

LIBERO-Long

✓ Task 1

"Put both the alphabet soup and the tomato sauce in the basket"

✓ Task 2

"Put both the cream cheese box and the butter..."

✓ Task 3

"Turn on the stove and put the moka pot on it"

✓ Task 4

"Put the black bowl in the bottom drawer of the cabinet"

✓ Task 5

"Put the white mug on the left plate..."

✓ Task 6

"Pick up the book and place it in the back compartment"

✓ Task 7

"Put the white mug on the plate..."

✓ Task 8

"Put both the alphabet soup and the cream cheese box..."

✓ Task 9

"Put both moka pots on the stove"

✓ Task 10

"Put the yellow and white mug in the microwave..."

Abstract

Method: CARVE

Quantitative Results

CARVE — LIBERO Task Suites