TY - JOUR
T1 - Swarm Imitation Learning From Observations
AU - HUSSEIN, Aya
AU - PETRAKI, Eleni
AU - Abbass, Hussein A.
PY - 2025
Y1 - 2025
N2 - Learning from observation (LfO) is a process where an agent learns a task by passively observing a more competent agent perform it. LfO differs from classical Learning from demonstration (LfD) in that the former requires access to the demonstrator' s states only, whereas the latter requires both the demonstrator' s states and the corresponding actions. On the one hand, LfO avoids the sometimes costly or impractical burden of collecting the demonstrator' s actions, and instead only requires the demonstrator' s states which are more easily captured through cameras or sensors. On the other hand, LfO is more challenging than classical LfD because of the lack of detailed guidance from action labels. Despite the success of LfO in single-agent tasks, the literature falls short of assessing its feasibility in swarm systems, where multiple agents act simultaneously to enact a system-level state change. We tackle this research gap by proposing Swarm-LfO that extends single-agent LfO by leveraging the centralised training with decentralised execution framework to learn a useful agent-centric inverse dynamic model (AIDM). AIDM enables the imitator swarm to predict agent-level actions that would lead to swarm state transitions similar to those exhibited by the demonstrator swarm. Pairs of states and the corresponding estimated actions are then used for learning to imitate the demonstrated behaviour in a supervised learning manner. Evaluation experiments are conducted using four tasks that require different levels of coordination between swarm members: flocking, sheltering, dispersion, and herding. The results show that the performance of Swarm-LfO is comparable to classical LfD methods that require access to action information. Swarm-LfO is extensively evaluated and has demonstrated continued success under various experimental conditions including noise and different sizes of the demonstrator and imitator swarms. Our contribution will pave the way for imitation learning in swarm...
AB - Learning from observation (LfO) is a process where an agent learns a task by passively observing a more competent agent perform it. LfO differs from classical Learning from demonstration (LfD) in that the former requires access to the demonstrator' s states only, whereas the latter requires both the demonstrator' s states and the corresponding actions. On the one hand, LfO avoids the sometimes costly or impractical burden of collecting the demonstrator' s actions, and instead only requires the demonstrator' s states which are more easily captured through cameras or sensors. On the other hand, LfO is more challenging than classical LfD because of the lack of detailed guidance from action labels. Despite the success of LfO in single-agent tasks, the literature falls short of assessing its feasibility in swarm systems, where multiple agents act simultaneously to enact a system-level state change. We tackle this research gap by proposing Swarm-LfO that extends single-agent LfO by leveraging the centralised training with decentralised execution framework to learn a useful agent-centric inverse dynamic model (AIDM). AIDM enables the imitator swarm to predict agent-level actions that would lead to swarm state transitions similar to those exhibited by the demonstrator swarm. Pairs of states and the corresponding estimated actions are then used for learning to imitate the demonstrated behaviour in a supervised learning manner. Evaluation experiments are conducted using four tasks that require different levels of coordination between swarm members: flocking, sheltering, dispersion, and herding. The results show that the performance of Swarm-LfO is comparable to classical LfD methods that require access to action information. Swarm-LfO is extensively evaluated and has demonstrated continued success under various experimental conditions including noise and different sizes of the demonstrator and imitator swarms. Our contribution will pave the way for imitation learning in swarm...
U2 - 10.1109/TETCI.2025.3569762
DO - 10.1109/TETCI.2025.3569762
M3 - Article
SN - 2471-285X
SP - 1
EP - 14
JO - IEEE Transactions on Emerging Topics in Computational Intelligence
JF - IEEE Transactions on Emerging Topics in Computational Intelligence
ER -