ARMADA
Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation

ARMADA: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation

Wenye Yu1,2, Jun Lv1,3, Zixi Ying3, Yang Jin1, Chuan Wen1,†, Cewu Lu1,2,3,†
1Shanghai Jiao Tong University, 2Shanghai Innovation Institute, 3Noematrix Ltd. Equal advising

Abstract

We devise ARMADA, a multi-robot deployment and adaptation system with human-in-the-loop shared control, featuring an autonomous online failure detection method named FLOAT.

Thanks to FLOAT, ARMADA enables paralleled policy rollout and requests human intervention only when necessary, significantly reducing reliance on human supervision. Hence, ARMADA enables efficient acquisition of in-domain data, and leads to more scalable deployment and faster adaptation to new scenarios.

FLOAT achieves remarkable progress in detection accuracy across multiple real-world tasks, compared to previous failure detection approaches. Besides, ARMADA manifests saliently larger increase in success rate and reduction in reliance on human intervention over multiple rounds of policy rollout and post-training, compared to previous human-in-the-loop learning methods.

Method Overview

ARMADA method overview

FLOAT failure detector conducts real-time OT matching between the policy embeddings of the current rollout and all expert demonstrations, and defines the minimum OT cost as FLOAT index. We thereby calibrate the FLOAT threshold on all successful rollouts.

When the FLOAT index of a rollout trajectory exceeds the threshold, we consider it a failure and employ adaptive rewinding based on OT computation, which helps retrace a previous timestep before the scene was disturbed.

Our multi-robot system then allocates an idle human operator to the failed robot for intervention, forming an efficient deployment paradigm. All the data collected are then utilized for post-training, facilitating scalable adaptation to deployment scenarios.

Illustration of adaptive rewinding

We design an adaptive rewinding mechanism that allows the robot to retrace a previous timestep while human operators can help reset the scene as it was, thus ensuring an intact and informative demonstration with human corrective behaviour.

Real-world Task Illustrations

Pour marbles into bowl

Hang mug on holder

Grasp mango and put it into drawer

Fold towel

Failure Detection Experiments

Failure detection experiment results
FLOAT achieves nearly 95% accuracy across four tasks, which is an improvement of over 20% compared to state-of-the-art baseline methods. It manifests comparable performance to its variant which further integrates action inconsistency metric, showcasing the effectiveness of FLOAT in detecting various failures.

FLOAT Visualization

We illustrate the FLOAT index by a growing curve, and also visualize the FLOAT threshold. When the FLOAT index exceeds the threshold, we consider it a failure.

Fold towel

Grasp mango and put it into drawer

Pour marbles into bowl

Faster Adaptation to Deployment Scenarios

Task success rate over 3 post-training rounds

ARMADA exhibits stable progress in success rate, with a more than four-fold increase compared to previous human-in-the-loop learning approach, thanks to our adaptive rewinding mechanism.

Human intervention rate over 3 post-training rounds

ARMADA results in a greater than two-fold reduction in human intervention rate compared to Sirius, showing potential in scalable deployment and adaptation.

Generalization to unseen scenarios

Multi-robot experiment scenes

We deploy the pretrained Fold towel policy on Scene A, B, and C (in-domain) for online data collection, and evaluate the post-trained policy on Scene D (out-of-distribution). The baseline method only utilizes Scene A for data collection, and evaluates the policy on Scene D.

Task success rate on unseen scenario

ARMADA boosts adaptation to unseen scenarios with paralleled policy deployment on multiple robots, compared to a traditional human-in-the-loop paradigm where one human operator attends to only one robot.

Human intervention duration during deployment

ARMADA scales up collection of human intervention trajectory with more robots in parallel, raising human occupancy and yielding correction data more prolifically on diverse deployment scenarios, which helps the policy generalize to unseen scenarios.

BibTeX

@misc{yu2025armadaautonomousonlinefailure,
      title={ARMADA: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation}, 
      author={Wenye Yu and Jun Lv and Zixi Ying and Yang Jin and Chuan Wen and Cewu Lu},
      year={2025},
      eprint={2510.02298},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2510.02298}, 
}