Alignment FAQ

  • Basic Knowledge

    Some basic knowledge related to alignment, covering but not limited to the origination of alignment, current research insights, etc.

  • Large Language Models

    Large Language Models (LLMs) represent a significant breakthrough in deep learning technology. As complex AI systems, factors such as hyperparameter settings, training processes, and model size will all affect their alignment with humans.

  • Reinforcement Learning

    Reinforcement Learning (RL) is a crucial component of alignment research. On the one hand, RL techniques (such as RLHF) can guide pre-trained language models to align with human preferences. On the other hand, issues inherent to RL techniques, such as reward hacking, are also concerns within the realm of alignment.

  • Reward Design

    RL employs methods that maximize reward functions for training. Whether the designed reward functions align with human intentions is a focal point of alignment research, categorized as the outer alignment problem in previous studies.