Intro to AI Alignment

Yaodong Yang

Talk on AI Alignment in RL China. AI alignment is a huge field, including not only mature basic methods scalable oversight and mechanism interpretability. The macro goal of AI alignment can be summarized as the RICE principles: Robustness, Interpretability, Controllability and Ethicality. Also, this talk mentioned that Learning from Feedback, Addressing Distributional Shift, and Assurance are the three core subfields of AI Alignment today. They form a continuously updated and iteratively improved alignment loop.

Value Alignment

Tianyi Qiu

This enlightening talk delves into the issue of value alignment of AI systems, exploring its history, theoretical frameworks, and its pivotal role in contemporary AI research. The talk begins with a review of the origins of machine ethics and early theoretical studies on how AI systems could align with human values, discussing how these foundational theories have evolved and intersect with common AI alignment research. Further, the talk explores the cutting-edge areas of value alignment, analyzing how computational social choice is applied to AI systems to incorporate diverse values and democratic inputs, providing a novel approach to ethical AI development. Additionally, the talk addresses the necessary socio-technical evaluations for assessing value alignment in the real world.