Agency Foundations Challenge (Sep 8-24, $10,000 prize pool)
Join us September 8 to 24th for an "agency foundations" research challenge hosted by Alignment Jams and Apart Research. We will explore - and award prizes - in several areas of research all focusing on human agency preservation in A(G)I-human interactions. Below we provide brief primers on each of these areas - but hope that more creative and novel ideas can be developed beyond these starting ideas.
Agency preserving RL & game theory AGI gyms ($2,500 prize pool)
Can we conceive of algorithms where AI systems evaluate and optimize for the preservation or increase of human agency rather than - or in addition to - truth, interpretability, corrigibility, sandboxing, value learning (ML/formal) etc? Are safe-AI paradigms even possible in game theory when one agent is omnipotent relative the others? See here for more.
Agency (interpretability) in neural networks
How do LLMs or DeepRL NNs represent other agents or themselves? Can we carry out mechanistic interpretability focusing both on behavior and network weights to characterize - and (re)direct - the emergence of agent and agency representation? See here for more.
The limits of (I)RL value learning from empirical data
What are the limitations of extracting the true reward function or policy given non-infinite empirical behavior data? Does having access to the internal states (e.g. neural states) of an agent - in addition to the observable behavior - improve accuracy and could such processes undermine the agency of the observed agent? See here for more.
Conceptual and governance proposals centred on agency preservation ($2,500 prize pool)
In the presence of superhuman intelligent AI systems - and in light of the complexity of human knowledge and value creation - is it sufficient to seek AI systems aligned to truth, interpretability and optimal value representation? Can we red-team existing technical approaches from the agency-perspective? See here for more.