Alignment Scope

In this section, we focus on illustrating the scope of AI alignment: we constructed the alignment process as an alignment cycle and decomposed it into Forward Alignment Process and Backward Alignment Process. Specifically, we discuss the role of human values in alignment and further analyze AI safety problems beyond alignment.

The Forward and Backward Process

We decompose alignment into Forward Alignment (alignment training) and Backward Alignment (alignment refinement). The two phases, forward and backward alignment, form a cycle where each phase produces or updates the input of the next phase.

The Alignment Cycle.

Summarized by Alignment Survey Team. For more details, please refer to our paper.

This cycle, what we call the alignment cycle, is repeated to produce increasingly aligned AI systems. We see alignment as a dynamic process in which all standards and practices should be continually assessed and updated.

Forward Alignment

Forward alignment aims to produce trained systems that follow alignment requirements. We decompose this task into Learning from Feedback and Learning under Distribution Shift.

Learning from Feedback

Learning from Feedback concerns how to provide and use feedback to any given outcomes or behaviors of the trained AI systems with any given input.

Learning from Feedback Framework.

Summarized by Alignment Survey Team. For more details, please refer to our paper.

Learning under Distribution Shift

Learning under Distribution Shift focuses specifically on the cases where the distribution of input changes, i.e., where distribution shift occurs.

Learning under Distribution Shift Framework.

Summarized by Alignment Survey Team. For more details, please refer to our paper.

Backward Alignment

Backward Alignment (alignment refinement) ensures the practical alignment of trained systems and revises alignment requirements.

Assurance

Once an AI system has undergone forward alignment, we still need to gain confidence about its alignment before deploying it. Such is the role of Assurance: assessing the alignment of trained AI systems.

Our Organization of Research Directions, Techniques, and Applications in Assurance.

Summarized by Alignment Survey Team. For more details, please refer to our paper.

Governance

The role of AI Governance is to mitigate this diverse array of risks. This necessitates governance efforts of AI systems that focus on their alignment and safety and cover the entire lifecycle of the system.

Our Framework for Analyzing AI Governance at Present.

Summarized by Alignment Survey Team. For more details, please refer to our paper.


Table: Relationships between alignment research directions covered in the survey and the RICE principles, featuring the individual objectives each research direction aims to achieve. Filled circles stand for primary objectives, and unfilled circles stand for secondary objectives.

Summarized by Alignment Survey Team. For more details, please refer to our paper.

Alignment Research DirectionsObjectives
CategoryDirectionMethodRobustnessInterpretabilityControllabilityEthicality
Learning from FeedbackPreference Modeling
Policy LearningRL/PbRL/IRL/Imitation Learning
RLHF
Scalable OversightRLxF
IDA
RRM
Debate
CIRL
Learning under Distribution ShiftAlgorithmic InterventionsDRO
IRM/REx
CBFT
Data Distribution InterventionsAdversarial Training
Cooperative Training
AssuranceSafety EvaluationsSocial Concern Evaluations
Extreme Risk Evaluations
Red Teaming
Interpretability
Human Values VerificationLearning/Evaluating Moral Values
Game Theory for Cooperative AI
GovernanceMulti-Stakeholder ApproachGovernment
Industry
Third Parties
International Governance
Open-source Governance

Previous
Next