Author : Muchao Ye
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (144 download)
Book Synopsis Evaluating and Certifying the Adversarial Robustness of Neural Language Models by : Muchao Ye
Download or read book Evaluating and Certifying the Adversarial Robustness of Neural Language Models written by Muchao Ye and published by . This book was released on 2024 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Language models (LMs) built by deep neural networks (DNNs) have achieved great success in various areas of artificial intelligence, which have played an increasingly vital role in profound applications including chatbots and smart healthcare. Nonetheless, the vulnerability of DNNs against adversarial examples still threatens the application of neural LMs to safety-critical tasks. To specify, DNNs will change their correct predictions into incorrect ones when small perturbations are added to the original input texts. In this dissertation, we identify key challenges in evaluating and certifying the adversarial robustness of neural LMs and bridge those gaps through efficient hard-label text adversarial attacks and a unified certified robust training framework. The first step of developing neural LMs with high adversarial robustness is evaluating whether they are empirically robust against perturbed texts. The vital technique related to that is the text adversarial attack, which aims to construct a text that can fool LMs. Ideally, it shall output high-quality adversarial examples in a realistic setting with high efficiency. However, current evaluation pipelines proposed in the realistic hard-label setting adopt heuristic search methods, consequently meeting an inefficiency problem. To tackle this limitation, we introduce a series of hard-label text adversarial attack methods, which successfully tackle the inefficiency problem by using a pretrained word embedding space as an intermediate. A deep dive into this idea illustrates that utilizing an estimated decision boundary in the introduced word embedding space helps improve the quality of crafted adversarial examples. The ultimate goal of constructing robust neural LMs is obtaining ones for which adversarial examples do not exist, which can be realized through certified robust training. The research community has proposed different types of certified robust training either in the discrete input space or in the continuous latent feature space. We discover the structural gap within current pipelines and unify them in the word embedding space. By removing unnecessary bound computation modules, i.e., interval bound propagation, and harnessing a new decoupled regularization learning paradigm, our unification can provide a stronger robustness guarantee. Given the aforementioned contributions, we believe our findings will help contribute to the development of robust neural LMs.