Author : Caio Corro
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (115 download)
Book Synopsis Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar Parsing by : Caio Corro
Download or read book Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar Parsing written by Caio Corro and published by . This book was released on 2018 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: In linguistics and Natural Language Processing (NLP), syntax is the studyof the structure of sentences in a given language. Two approaches have mainlybeen considered to describe them: dependency structures and phrase-structures.A dependency links a pair of words together with its relation type whereas aphrase-structure describe a sentence by means of a hierarchy of word sets calledconstituents. In this thesis, we focus on phrase-structure parsing, that is thecomputation of the constituency structure of a given sentence. Context-FreeGrammars (CFGs) have been widely adopted by the NLP community due totheir simplicity and the low complexity of their parsing algorithms. However,CFGs are too limited in order to describe all phenomena observed in naturallanguage structures. Therefore, Lexicalized Tree Adjoining Grammars (LTAGs)have been widely studied as a plausible alternative, among others. They aremore expressive than CFGs but can also be parsed in polynomial time. Unfortunately,the best known algorithm has a O(n7) time complexity with n thelength of the input sentence. Thus, in practice most algorithms are based ongreedy methods which require fairly strong independence assumptions. Themain approach in the literature, called supertagging, lters the search space ina pre-processing step while ignoring long distance relationships, one of the mainmotivation for LTAGs.In the past years, combinatorial optimization techniques have been successfullyapplied to computationally challenging NLP tasks. We follow this line ofwork in the case of LTAG parsing. More precisely, in our setting, a given NLPproblem is reduced to a subgraph selection problem. As such, it has a genericform which may interest other research communities. Then we formulate thegeneric graph problem as an Integer Linear Program. Integer Linear Programinghas been widely studied and many optimization methods exist. We focus onLagrangian relaxation which previously received much attention from the NLPcommunity. Interestingly, the proposed algorithms can be parametrized to Et arange of different data without impacting eciency.Our erst contribution is a novel pipeline for LTAG parsing. Contrary tothe supertagging approach, we propose a pre-processing step which takes intoaccount relationships between words: well-nested dependency parsing with 2-bounded block degree. An algorithm with a O(n7) time complexity has beenproposed for this problem in the literature, which is similar to the standardLTAG parser complexity. In order to tackle the complexity challenge, we showthat it can be reduced to a subgraph selection problem which can be expressed23via a generic ILP. With our algorithm, the well-nested constraint can easily betoggled o and the block degree bound can be changed. Thus, as an example,it can be used for parsing problems related to other lexicalized grammars. Weexperiment on several problems showing the emciency and usefulness of ourmethod.Our second contribution is a novel approach for discontinuous constituentparsing. We introduce a variant of LTAG for this task. Parsing is then equivalentto the joint tagging and non-projective dependency parsing problem. Weshow that it can be reduced to the Generalized Maximum Spanning Arborescenceproblem which has been previously studied in the combinatorial optimizationliterature. A novel resolution algorithm based on Lagrangian relaxation isproposed. We experiment on two standard discontinuous constituent datasetsand obtain state-of-the-art results alongside competitive decoding speed.