Distribution Shift
In this section, we will initially demonstrate why one of the primary challenges in alignment is learning under distribution shift, and more specifically, the preservation of alignment properties (i.e., adherence to human intentions and values) under distribution shift.
Algorithmic Interventions
This section outlines two classes of methods that steer optimization during training to relieve distributional shift, i.e., Cross-Distribution Aggregation and Navigation via Mode Connectivity.
Data Distribution Interventions
This section will introduce two methods that address distributional shift by targetedly expanding training distribution, which can enhance AI systems’ robustness and trustworthiness.