Publications‎ > ‎Conference Papers‎ > ‎

Covariance analysis as a measure of policy robustness

posted May 30, 2016, 3:00 PM by Reza Ahmadzadeh   [ updated Jun 24, 2017, 10:23 AM ]
Nawid Jamali, Petar Kormushev, Seyed Reza Ahmadzadeh, Darwin G. Caldwell

Nawid Jamali, Petar Kormushev, Seyed Reza Ahmadzadeh, Darwin G. Caldwell, "Covariance analysis as a
measure of policy robustness", In Proc. MTS/IEEE Intl Conf. OCEANS 2014, Taipei, Taiwan, 7-10 Apr.
Bibtex Entry:
@INPROCEEDINGS{jamali2014covariance, TITLE={Covariance Analysis as a Measure of Policy Robustness}, AUTHOR={Jamali, Nawid and Kormushev, Petar and Ahmadzadeh, Seyed Reza and Caldwell, Darwin G}, BOOKTITLE={{MTS/IEEE OCEANS}}, PAGES={1--5}, YEAR={2014}, MONTH={April}, ORGANIZATION={IEEE}, ADDRESS={Taipei, Taiwan}, DOI={10.1109/OCEANS-TAIPEI.2014.6964339} }
In this paper we propose covariance analysis as a metric for reinforcement learning to improve the
robustness of a learned policy. The local optima found during the exploration are analyzed in terms
of the total cumulative reward and the local behavior of the system in the neighborhood of the
optima. The analysis is performed in the solution space to select a policy that exhibits robustness
in uncertain and noisy environments. We demonstrate the utility of the method using our previously
developed system where an autonomous underwater vehicle (AUV) has to recover from a thruster failure
[1]. When a failure is detected the recovery system is invoked, which uses simulations to learn a
new controller that utilizes the remaining functioning thrusters to achieve the goal of the AUV,
that is, to reach a target position. In this paper, we use covariance analysis to examine the
performance of the top, n, policies output by the previous algorithm. We propose a scoring metric
that uses the output of the covariance analysis, the time it takes the AUV to reach the target
position and the distance between the target position and the AUV's final position. The top polices
are simulated in a noisy environment and evaluated using the proposed scoring metric to analyze the
effect of noise on their performance. The policy that exhibits more tolerance to noise is selected.
We show experimental results where covariance analysis successfully selects a more robust policy
that was ranked lower by the original algorithm.

PDF Preview: