TOP-10 Papers Recommended in 2024-02

PapersAuthorsPublished inDate
Achieving Efficient Alignment through Weak-to-Strong CorrectionJiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong YangarXiv preprint2024-02
A Minimaximalist Approach to Reinforcement Learning from Human FeedbackGokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh AgarwalarXiv preprint2024-01
Selective Visual Representations Improve Convergence and Generalization for Embodied AIAinaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay KrishnaInternational Conference on Learning Representations2023-11
Improving Generalization of Alignment with Human Preferences through Group Invariant LearningRui Zheng, Wei Shen, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing HuangInternational Conference on Learning Representations2023-10
Confronting Reward Model Overoptimization with Constrained RLHFTed Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleerInternational Conference on Learning Representations2023-10
Tool-Augmented Reward ModelingLei Li, Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Ningyu Zhang, Hua WuInternational Conference on Learning Representations2023-10
Text2Reward: Automated Dense Reward Function Generation for Reinforcement LearningTianbao Xie, Siheng Zhao, Chen Henry Wu, Yitao Liu, Qian Luo, Victor Zhong, Yanchao Yang, Tao YuInternational Conference on Learning Representations2023-09
Query-Policy Misalignment in Preference-Based Reinforcement LearningXiao Hu, Jianxiong Li, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin ZhangInternational Conference on Learning Representations2023-05
Evaluating the Zero-shot Robustness of Instruction-tuned Language ModelsJiuding Sun, Chantal Shaib, Byron C. WallaceInternational Conference on Learning Representations2023-06
Cascading Reinforcement LearningYihan Du, R. Srikant, Wei ChenInternational Conference on Learning Representations2024-01
Achieving Efficient Alignment through Weak-to-Strong Correction

Authors: Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang

Published in: arXiv preprint

Date: 2024-02

Read More Google Scholar

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Authors: Gokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh Agarwal

Published in: arXiv preprint

Date: 2024-01

Read More Google Scholar

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

Published in: International Conference on Learning Representations

Date: 2023-11

Read More Google Scholar

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning

Authors: Rui Zheng, Wei Shen, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang

Published in: International Conference on Learning Representations

Date: 2023-10

Read More Google Scholar

Confronting Reward Model Overoptimization with Constrained RLHF

Authors: Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer

Published in: International Conference on Learning Representations

Date: 2023-10

Read More Google Scholar

Tool-Augmented Reward Modeling

Authors: Lei Li, Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Ningyu Zhang, Hua Wu

Published in: International Conference on Learning Representations

Date: 2023-10

Read More Google Scholar

Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Authors: Tianbao Xie, Siheng Zhao, Chen Henry Wu, Yitao Liu, Qian Luo, Victor Zhong, Yanchao Yang, Tao Yu

Published in: International Conference on Learning Representations

Date: 2023-09

Read More Google Scholar

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Authors: Xiao Hu, Jianxiong Li, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

Published in: International Conference on Learning Representations

Date: 2023-05

Read More Google Scholar

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

Authors: Jiuding Sun, Chantal Shaib, Byron C. Wallace

Published in: International Conference on Learning Representations

Date: 2023-06

Read More Google Scholar

Cascading Reinforcement Learning

Authors: Yihan Du, R. Srikant, Wei Chen

Published in: International Conference on Learning Representations

Date: 2024-01

Read More Google Scholar