In many machine learning sub-fields, state-of-the-art approaches are notable for performing as well as human experts. For instance, generative models are capable of producing text, images, and videos in infinitely many styles. State-of-the-art DRL methods, by contrast, are exciting because they can uncover novel, optimal behaviors. The match between AlphaGo and Lee Sedol offered an illustrative example. Specifically, AlphaGo’s move 37 in the second game astounded even the world’s top Go players. Such analogous emergent behaviors in real-world settings such as healthcare could have profound implications. Despite recent successes, existing real-world DRL methods rely on implicit assumptions about the problem context, such as the ability to reset environments, which restricts their applicability beyond their intended settings (typically robotic locomotion or manipulation). In this project, we design methods to overcome a variety of these limitations.