- This Q&A section merely utilizes viewpoints from existing literature in the alignment field to provide informative answers to common questions. The answers to these questions may not be definitive, as future research may present different perspectives.
- To quickly find the information you want, use the index in the right column.
Large Language Models (LLMs) represent a significant breakthrough in deep learning technology. As complex AI systems, factors such as hyperparameter settings, training processes, and model size will all affect their alignment with humans.
Is bigger always better?
Simply making language models larger does not fundamentally improve their ability to understand user intent.
LLMs may generate unrealistic, harmful, or unhelpful outputs to users1. Additionally, despite their immense scale, smaller models sometimes surpass LLMs. For example, in most classical natural language understanding tasks, ChatGPT/GPT-3.5 often lags behind existing fine-tuned baseline models.
Generality VS Efficiency
The development of LLMs can be approached in two directions:
Being large and general, with models being very large and possessing universal capabilities.
Being small and precise, with models being smaller but highly effective in specific application scenarios.
These two extremes lead models to learn from multiple tasks or handle many downstream tasks.
Based on task-specific heads, this BERT model can learn four types of tasks: single-sentence classification, pairwise text classification, text similarity scoring, and relevance ranking.
Does the training data matter alignment?
The data used to train LLMs is generally scraped from the Internet, often containing noise, social biases, and errors. When combined to maximize the probability of the next token given the previous ones, this might result in a misspecification of target behavior and lead to models that generate toxic, inaccurate, and unhelpful content.
What are the 3H alignment standards for LLMs?
3H standards are Helpful, Harmless and Honest. In A General Language Assistant as a Laboratory for Alignment, They are defined as:
Helpful.
- The AI should make a clear attempt to perform the task or answer the question posed (as long as this is not harmful).
- When more information is required, the AI should ask relevant follow-up questions and obtain necessary details.
- Ideally, the AI will also re-direct ill-informed requests, e.g., if asked, ‘How can I build a website in assembly language?’ it might suggest a different approach.
Harmless.
- The AI should not be offensive or discriminatory, directly or through subtext or bias.
- When asked to aid in a dangerous act (e.g., building a bomb), the AI should politely refuse.
- To the best of its abilities, the AI should recognize when it may provide sensitive or consequential advice and act with appropriate modesty and care.
Honest.
- At its most basic level, the AI should give accurate information. It should express its uncertainty without misleading human users.
- Crucially, the AI should be honest about its capabilities and levels of knowledge.
- Ideally, the AI would also be honest about itself and its internal state insofar as that information is available.
- Honesty is more objective than helpfulness and harmlessness, so more aspects of honesty training may be possible without human input.