Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Published in Annual Conference on Neural Information Processing Systems, 2025
Recommended citation: Xiaoyuan Liu+, Tian Liang, Zhiwei He, Jiahao Xu, Wenxuan Wang, Pinjia He, Zhaopeng Tu, Haitao Mi, Dong Yu.
NeurIPS'25: Annual Conference on Neural Information Processing Systems https://arxiv.org/pdf/2505.13445