Feedback Types
Feedback is a crucial conduit linking AI behaviors to human intentions leveraged by AI systems to refine their objectives and more closely align with human values. In this section, we introduce three types of feedback employed to align AI systems commonly: reward, demonstration, and comparison.
Preference Modeling
Preference Modeling, which emphasizes comparison feedback, has emerged as a promising way to aid in aligning powerful AI systems.
Policy Learning
In this section, we introduce related areas of Policy Learning, aiming to give readers without extra detailed domain knowledge a general understanding of alignment.
Scalable Oversight
Scalable oversight seeks to ensure that AI systems, even those surpassing human expertise, remain aligned with human intent.