The applicability of deep reinforcement learning
algorithms to the domain of robotics is limited by the issue of
sample inefficiency. As in most machine learning methods, more
samples generally mean better learning effectiveness. Sample
collection for robotics application is a time-consuming process
in addition to safety issues for both the robot itself and the
environment surrounding it that come into play for real-world
scenarios. Because of these limitations, sample efficiency plays a
very vital role in the field of robotic learning. To deal with this,
curriculum learning offers a methodology that allows robots to
suffer less from the sample collection burden required, trying
to keep it at a minimum. This study aims to tackle the sample
inefficiency that deep reinforcement learning algorithms face in
the domain of robotics by designing a curriculum. We propose
an algorithm which decides on the sequence of tasks that the
agent must learn to enable the transfer of knowledge in a
sample-efficient manner towards the target task. Our algorithm
performs a parameter-space task representation for the purpose
of deciding on the difficultiness of the tasks. Once the difficulty
level of each is determined, easy tasks are learned first before
the final target task. We perform a study on a double inverted
pendulum setup. Simulation results showed that transfer of
knowledge via curriculum is more sample efficient than a direct
transfer.
License type:
Funding Info:
This work has taken place in the Learning Agents Research Group (LARG) at UT Austin. LARG research is supported in part by NSF (CNS-1330072, CNS-1305287, IIS1637736, IIS-1651089), ONR (21C184-01), and AFOSR (FA9550-14-1-0087). Peter Stone serves on the Board of Directors of, Cogitai, Inc. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.