RL China Talk

Yaodong Yang

Talk on AI Alignment in RL China. AI alignment is a huge field, including not only mature basic methods scalable oversight and mechanism interpretability. The macro goal of AI alignment can be summarized as the RICE principles: Robustness, Interpretability, Controllability and Ethicality. Also, this talk mentioned that Learning from Feedback, Addressing Distributional Shift, and Assurance are the three core subfields of AI Alignment today. They form a continuously updated and iteratively improved alignment loop.