#3: The limits of (I)RL-based value learning
Here we are interested in knowing whether complete behavior information or behavior + neural states can guarantee safe-behavior in AI/AGI systems. That is, we are interested in more empirical questions of whether we can ever hope to learn a good enough representation of human reward functions. We propose at least two types of research directions:
-
Using real or artificial behavior data to extract reward functions of behaving agents and study the goodness or usefulness of such functions as we increasing data size. Is there a limit here? Perhaps we find that it's not the size of the datasets but certain characteristics?
-
Anticipating the arrival of neural and biometric data and evaluating whether such datasets would improve our representation of values and reward functions. So is there something as good as behavior (e.g. internal thoughts/neural states) that we can train our IRL/RL algorithms on?
IRL from massively large behavior datasets
What if we could capture behavior data for an organism for hundreds of hours (or more) - could we use IRL to reliably extract reward functions and policies that are reasonable or plausible? (Note: We are in the process of currating a large dataset with millions of time points for rodent behavior to be released on eLife in August to be used at the hackathon. More updates soon.)
RL from internal states
What if in addition to behavior data, we also had realtime access to the internal states (e.g. brain states) of an organism - would this improve the accuracy of our reward functions? Here, we may not have access to biological organism neural states - but another option may be to train DeepRL nets on the behavior generative models of artificial organisms.