Alpha. curiolab.science is in early alpha testing — expect rough edges, broken links, and content that may change without notice.

How can we align powerful AI systems with human values?

Category: Computer Science

Status: Queued

As AI systems become more capable, ensuring that their objectives remain aligned with human intentions becomes a research problem with no settled solution. The 'alignment problem' covers reward specification, deceptive optimisation, scalable oversight and interpretability.

Active areas include RLHF and its successors, mechanistic interpretability of neural networks, formal verification of learned policies, and constitutional methods. No proposal has yet been shown to scale to systems significantly more capable than current LLMs.

Sources

Runs

No runs yet — this question is queued.