Learning from Feedback

  • Feedback Types

    Feedback is a crucial conduit linking AI behaviors to human intentions leveraged by AI systems to refine their objectives and more closely align with human values. In this section, we introduce three types of feedback employed to align AI systems commonly: reward, demonstration, and comparison.

  • Preference Modeling

    Preference Modeling, which emphasizes comparison feedback, has emerged as a promising way to aid in aligning powerful AI systems.

  • Policy Learning

    In this section, we introduce related areas of Policy Learning, aiming to give readers without extra detailed domain knowledge a general understanding of alignment.

  • Scalable Oversight

    Scalable oversight seeks to ensure that AI systems, even those surpassing human expertise, remain aligned with human intent.