TY - GEN
T1 - Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring
AU - Hussein, Maryem
AU - Keshk, Marwa
AU - Hussein, Aya
N1 - Funding Information:
Acknowledgement. This work was funded by the Department of Defence and the Office of National Intelligence under the AI for Decision Making Program, delivered in partnership with the NSW Defence Innovation Network Grant Number RG213520.
Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2024
Y1 - 2024
N2 - The remarkable success achieved by Reinforcement learning (RL) in recent years is mostly confined to stationary environments. In realistic settings, RL agents can encounter non-stationarity when the environmental dynamics change over time. Detecting when this change occurs is crucial for activating adaptation mechanisms at the right time. Existing research on change detection mostly relies on model-based techniques which are challenging for tasks with large state and action spaces. In this paper, we propose a model-free, low-cost approach based on value functions (V or Q) for detecting non-stationarity. The proposed approach calculates the change in the value function (ΔV or ΔQ ) and monitors the distribution of this change over time. Statistical hypothesis testing is used to detect if the distribution of ΔV or ΔQ changes significantly over time, reflecting non-stationarity. We evaluate the proposed approach in three benchmark RL environments and show that it can successfully detect non-stationarity when changes in the environmental dynamics are introduced at different magnitudes and speeds. Our experiments also show that changes in ΔV or ΔQ can be used for context identification leading to a classification accuracy of up to 88%.
AB - The remarkable success achieved by Reinforcement learning (RL) in recent years is mostly confined to stationary environments. In realistic settings, RL agents can encounter non-stationarity when the environmental dynamics change over time. Detecting when this change occurs is crucial for activating adaptation mechanisms at the right time. Existing research on change detection mostly relies on model-based techniques which are challenging for tasks with large state and action spaces. In this paper, we propose a model-free, low-cost approach based on value functions (V or Q) for detecting non-stationarity. The proposed approach calculates the change in the value function (ΔV or ΔQ ) and monitors the distribution of this change over time. Statistical hypothesis testing is used to detect if the distribution of ΔV or ΔQ changes significantly over time, reflecting non-stationarity. We evaluate the proposed approach in three benchmark RL environments and show that it can successfully detect non-stationarity when changes in the environmental dynamics are introduced at different magnitudes and speeds. Our experiments also show that changes in ΔV or ΔQ can be used for context identification leading to a classification accuracy of up to 88%.
KW - Context Detection
KW - Deep Reinforcement Learning
KW - Non-stationarity
UR - http://www.scopus.com/inward/record.url?scp=85178609542&partnerID=8YFLogxK
UR - https://ajcai2023.org/
UR - https://link.springer.com/book/10.1007/978-981-99-8388-9
U2 - 10.1007/978-981-99-8391-9_28
DO - 10.1007/978-981-99-8391-9_28
M3 - Conference contribution
AN - SCOPUS:85178609542
SN - 9789819983872
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 350
EP - 362
BT - AI 2023
A2 - Liu, Tongliang
A2 - Webb, Geoff
A2 - Yue, Lin
A2 - Wang, Dadong
PB - Springer
CY - Singapore
T2 - 36th Australasian Joint Conference on Artificial Intelligence, AJCAI 2023
Y2 - 28 November 2023 through 1 December 2023
ER -