Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Published in Annual Meeting of the Association for Computational Linguistics, 2025
Recommended citation: Youliang Yuan+, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu.
ACL'25: Annual Meeting of the Association for Computational Linguistics https://aclanthology.org/2025.acl-long.158.pdf