Note: Some authors’ names cannot be represented using the 26-letter English alphabet. We uniformly utilize the content from the ‘author’ field exported from Google Scholar.
Papers | Authors | Published in | Date | |
---|---|---|---|---|
Deep reinforcement learning from human preferences | Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei | Advances in neural information processing systems | 2017-06 | |
Goal Misgeneralization in Deep Reinforcement Learning | Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei | International Conference on Machine Learning | 2021-05 | |
Unsolved Problems in ML Safety | Lauro Langosco Di Langosco, Jack Koch, Lee D Sharkey, Jacob Pfau, David Krueger | arXiv preprint arXiv:2109.13916 | 2021-09 | |
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny and others | Advances in Neural Information Processing Systems | 2022-01 | |
Training language models to follow instructions with human feedback | Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others | Advances in Neural Information Processing Systems | 2022-03 | |
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback | Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and DasSarma, Nova and Drain, Dawn and Fort, Stanislav and Ganguli, Deep and Henighan, Tom and others | arXiv preprint arXiv:2204.05862 | 2022-04 | |
The alignment problem from a deep learning perspective | Ngo, Richard and Chan, Lawrence and Mindermann, S{"o}ren | arXiv preprint arXiv:2209.00626 | 2022-09 | |
Constitutional AI: Harmlessness from AI Feedback | Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others | arXiv preprint arXiv:2212.08073 | 2022-12 | |
Llama 2: Open Foundation and Fine-Tuned Chat Models | Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others | arXiv preprint arXiv:2307.09288 | 2023-07 | |
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback | Casper, Stephen and Davies, Xander and Shi, Claudia and Gilbert, Thomas Krendl and Scheurer, J{'e}r{'e}my and Rando, Javier and Freedman, Rachel and Korbak, Tomasz and Lindner, David and Freire, Pedro and others | arXiv preprint arXiv:2307.15217 | 2023-07 |
Authors: Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei
Published in: Advances in neural information processing systems
Date: 2017-06
Authors: Lauro Langosco Di Langosco, Jack Koch, Lee D Sharkey, Jacob Pfau, David Krueger
Published in: International Conference on Machine Learning
Date: 2021-05
Authors: Hendrycks, Dan and Carlini, Nicholas and Schulman, John and Steinhardt, Jacob
Published in: arXiv preprint arXiv:2109.13916
Date: 2021-09
Authors: Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny and others
Published in: Advances in Neural Information Processing Systems
Date: 2022-01
Authors: Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others
Published in: Advances in Neural Information Processing Systems
Date: 2022-03
Authors: Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and DasSarma, Nova and Drain, Dawn and Fort, Stanislav and Ganguli, Deep and Henighan, Tom and others
Published in: arXiv preprint arXiv:2204.05862
Date: 2022-04
Authors: Ngo, Richard and Chan, Lawrence and Mindermann, S{"o}ren
Published in: arXiv preprint arXiv:2209.00626
Date: 2022-09
Authors: Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others
Published in: arXiv preprint arXiv:2212.08073
Date: 2022-12
Authors: Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others
Published in: arXiv preprint arXiv:2307.09288
Date: 2023-07
Authors: Casper, Stephen and Davies, Xander and Shi, Claudia and Gilbert, Thomas Krendl and Scheurer, J{'e}r{'e}my and Rando, Javier and Freedman, Rachel and Korbak, Tomasz and Lindner, David and Freire, Pedro and others
Published in: arXiv preprint arXiv:2307.15217
Date: 2023-07