Agency Foundations Challenge
(Sep 8-24, 2023, $10,000 prize pool)

Join us September 8 to 24th for an "agency foundations" research challenge hosted by Alignment Jams and Apart Research. We will explore - and award prizes - in several areas of research all focusing on human agency preservation in A(G)I-human interactions. Below we provide brief primers on each of these areas - but hope that more creative and novel ideas can be developed beyond these starting ideas.

Agency preserving RL & game theory AGI gyms ($2,500 prize pool)

Can we conceive of algorithms where AI systems evaluate and optimize for the preservation or increase of human agency rather than - or in addition to - truth, interpretability, corrigibility, sandboxing, value learning (ML/formal) etc? Are safe-AI paradigms even possible in game theory when one agent is omnipotent relative to the others? See more on RL

Agency (interpretability) in neural networks ($2,500 prize pool)

How do LLMs or DeepRL NNs represent other agents or themselves? Can we carry out mechanistic interpretability focusing both on behavior and network weights to characterize - and (re)direct - the emergence of agent and agency representation? See more on mechint

The limits of (I)RL value learning from empirical data ($2,500 prize pool)

What are the limitations of extracting the true reward function or policy given non-infinite empirical behavior data? Does having access to the internal states (e.g. neural states) of an agent - in addition to the observable behavior - improve accuracy and could such processes undermine the agency of the observed agent? See more on IRL

Conceptual and governance proposals centred on agency preservation ($2,500 prize pool)

In the presence of superhuman intelligent AI systems - and in light of the complexity of human knowledge and value creation - is it sufficient to seek AI systems aligned to truth, interpretability and optimal value representation? Can we red-team existing technical approaches from the agency-perspective? See more on governance

Agency Foundations Challenge (Sep 8-24, 2023, $10,000 prize pool)

Agency preserving RL & game theory AGI gyms ($2,500 prize pool)

Agency (interpretability) in neural networks ($2,500 prize pool)

The limits of (I)RL value learning from empirical data ($2,500 prize pool)

Conceptual and governance proposals centred on agency preservation ($2,500 prize pool)

Agency Foundations Challenge
(Sep 8-24, 2023, $10,000 prize pool)