Comparable Corpora in Cross-language Information Retrieval

Download Comparable Corpora in Cross-language Information Retrieval PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (536 download)

DOWNLOAD NOW!


Book Synopsis Comparable Corpora in Cross-language Information Retrieval by : Tuomas Talvensaari

Download or read book Comparable Corpora in Cross-language Information Retrieval written by Tuomas Talvensaari and published by . This book was released on 2008 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Cross-language information retrieval (CLIR) enables users to express queries in a language different from the language of the documents to be retrieved. For example, a Finnish-speaking person could pose a query to a CLIR system in Finnish (the source language) to retrieve documents written in English (the target language). The language barrier is usually crossed by translating the query into the target language, after which the documents can be retrieved with the methods of monolingual information retrieval (IR). Aligned text collections (corpora) are common query translation resources in CLIR. A parallel corpus is a collection where texts in one language are aligned with their translations in another language. The aligned texts of a comparable corpus are more loosely related. They are not translations, but share topics and include common vocabulary in the two languages. Both kinds of corpora can be used to train statistical translation models, but parallel corpora are preferred because more dependable translation knowledge can be derived from them. However, parallel corpora do not exist for all language pairs and domains. Hence, it is sometimes necessary to resort to noisier comparable corpora. This thesis proposes new methods for the acquisition, alignment, and employment of comparable corpora. The acquisition method is based on language-aware focused web crawling, where web content written in specific languages and discussing specific topics of interest is obtained by employing the hyperlink structure of the web. In the alignment phase, the source language documents are used as CLIR queries to retrieve target language documents. The similarity of the query to the documents, and various other factors, are used as evidence to form alignments between the source and target language documents. The constructed corpora were employed in query translation as a cross-language similarity thesaurus, a structure where target language words are ranked based on their similarity with a source language word that is given as input. The highest ranking words are assumed to be either translations of the input word or related to it in some other manner. The methods were evaluated with extensive IR experiments that covered different language pairs, domains, and test data. The proposed CLIR approach was combined with approaches based on bilingual dictionaries. The combined approaches outperformed pure dictionary-based translation. In addition, the comparable corpus translation performed better in domain-specific CLIR than translation utilizing high-quality parallel corpora. This suggests that the proposed methods are particularly useful in domains where CLIR resources are scarce."

Cross-Language Information Retrieval

Download Cross-Language Information Retrieval PDF Online Free

Author :
Publisher : Springer Nature
ISBN 13 : 303102138X
Total Pages : 125 pages
Book Rating : 4.0/5 (31 download)

DOWNLOAD NOW!


Book Synopsis Cross-Language Information Retrieval by : Jian-Yun Nie

Download or read book Cross-Language Information Retrieval written by Jian-Yun Nie and published by Springer Nature. This book was released on 2022-05-31 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt: Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR. Table of Contents: Preface / Introduction / Using Manually Constructed Translation Systems and Resources for CLIR / Translation Based on Parallel and Comparable Corpora / Other Methods to Improve CLIR / A Look into the Future: Toward a Unified View of Monolingual IR and CLIR? / References / Author Biography

Evaluating Information Retrieval and Access Tasks

Download Evaluating Information Retrieval and Access Tasks PDF Online Free

Author :
Publisher : Springer Nature
ISBN 13 : 9811555540
Total Pages : 225 pages
Book Rating : 4.8/5 (115 download)

DOWNLOAD NOW!


Book Synopsis Evaluating Information Retrieval and Access Tasks by : Tetsuya Sakai

Download or read book Evaluating Information Retrieval and Access Tasks written by Tetsuya Sakai and published by Springer Nature. This book was released on 1901 with total page 225 pages. Available in PDF, EPUB and Kindle. Book excerpt: This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, todays smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students--anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one.

Building and Using Comparable Corpora

Download Building and Using Comparable Corpora PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 3642201288
Total Pages : 333 pages
Book Rating : 4.6/5 (422 download)

DOWNLOAD NOW!


Book Synopsis Building and Using Comparable Corpora by : Serge Sharoff

Download or read book Building and Using Comparable Corpora written by Serge Sharoff and published by Springer Science & Business Media. This book was released on 2013-12-13 with total page 333 pages. Available in PDF, EPUB and Kindle. Book excerpt: The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Download Building and Using Comparable Corpora for Multilingual Natural Language Processing PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 9783031313837
Total Pages : 0 pages
Book Rating : 4.3/5 (138 download)

DOWNLOAD NOW!


Book Synopsis Building and Using Comparable Corpora for Multilingual Natural Language Processing by : Serge Sharoff

Download or read book Building and Using Comparable Corpora for Multilingual Natural Language Processing written by Serge Sharoff and published by Springer. This book was released on 2023-07-01 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.

Cross-Language Information Retrieval

Download Cross-Language Information Retrieval PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 1461556619
Total Pages : 190 pages
Book Rating : 4.4/5 (615 download)

DOWNLOAD NOW!


Book Synopsis Cross-Language Information Retrieval by : Gregory Grefenstette

Download or read book Cross-Language Information Retrieval written by Gregory Grefenstette and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 190 pages. Available in PDF, EPUB and Kindle. Book excerpt: Most of the papers in this volume were first presented at the Workshop on Cross-Linguistic Information Retrieval that was held August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of Dublin University and Paraic Sheridan of the ETH, Zurich, were the two other members of the Scientific Committee for this workshop. SIGIR is the Association for Computing Ma chinery (ACM) Special Interest Group on Information Retrieval, and they have held conferences yearly since 1977. Three additional papers have been added: Chapter 4 Distributed Cross-Lingual Information retrieval describes the EMIR retrieval system, one of the first general cross-language systems to be implemented and evaluated; Chapter 6 Mapping Vocabularies Using Latent Semantic Indexing, which originally appeared as a technical report in the Lab oratory for Computational Linguistics at Carnegie Mellon University in 1991, is included here because it was one of the earliest, though hard-to-find, publi cations showing the application of Latent Semantic Indexing to the problem of cross-language retrieval; and Chapter 10 A Weighted Boolean Model for Cross Language Text Retrieval describes a recent approach to solving the translation term weighting problem, specific to Cross-Language Information Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent Xerox Research Centre Europe Information Retrieval Grenoble Laboratory Computer Science Department University of Massachusetts Thomas K. Landauer Department of Psychology Mark W. Davis and Institute of Cognitive Science Computing Research Lab University of Colorado, Boulder New Mexico State University Michael L. Littman Bonnie J.

Advances in Cross-Language Information Retrieval

Download Advances in Cross-Language Information Retrieval PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3540452370
Total Pages : 835 pages
Book Rating : 4.5/5 (44 download)

DOWNLOAD NOW!


Book Synopsis Advances in Cross-Language Information Retrieval by : Martin Braschler

Download or read book Advances in Cross-Language Information Retrieval written by Martin Braschler and published by Springer. This book was released on 2003-11-17 with total page 835 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the thoroughly refereed post-proceedings of a workshop by the Cross-Language Evaluation Forum Campaign, CLEF 2002, held in Rome, Italy in September 2002. The 43 revised full papers presented together with an introduction and run data in an appendix were carefully reviewed and revised upon presentation at the workshop. The papers are organized in topical sections on systems evaluation experiments, cross language and more, monolingual experiments, mainly domain-specific information retrieval, interactive issues, cross-language spoken document retrieval, and cross-language evaluation issues and initiatives.

Cross-Language Information Retrieval and Evaluation

Download Cross-Language Information Retrieval and Evaluation PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 3540424466
Total Pages : 396 pages
Book Rating : 4.5/5 (44 download)

DOWNLOAD NOW!


Book Synopsis Cross-Language Information Retrieval and Evaluation by : Cross-Language Evaluation Forum. Workshop

Download or read book Cross-Language Information Retrieval and Evaluation written by Cross-Language Evaluation Forum. Workshop and published by Springer Science & Business Media. This book was released on 2001-08-29 with total page 396 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the thoroughly refereed post-proceedings of the international Cross-Language Evaluation Forum Workshop organized by the CLEF activity of the European DELOS Network of Excellence for Digital Libraries. The 25 revised papers presented together with an introduction were carefully selected based on two rounds of reviewing. All current aspects of cross-language information retrieval are addressed, ranging from foundational issues and systems evaluation to applications in a variety of fields.

EuroWordNet: A multilingual database with lexical semantic networks

Download EuroWordNet: A multilingual database with lexical semantic networks PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 9401714916
Total Pages : 180 pages
Book Rating : 4.4/5 (17 download)

DOWNLOAD NOW!


Book Synopsis EuroWordNet: A multilingual database with lexical semantic networks by : Piek Vossen

Download or read book EuroWordNet: A multilingual database with lexical semantic networks written by Piek Vossen and published by Springer Science & Business Media. This book was released on 2013-11-11 with total page 180 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book describes the main objective of EuroWordNet, which is the building of a multilingual database with lexical semantic networks or wordnets for several European languages. Each wordnet in the database represents a language-specific structure due to the unique lexicalization of concepts in languages. The concepts are inter-linked via a separate Inter-Lingual-Index, where equivalent concepts across languages should share the same index item. The flexible multilingual design of the database makes it possible to compare the lexicalizations and semantic structures, revealing answers to fundamental linguistic and philosophical questions which could never be answered before. How consistent are lexical semantic networks across languages, what are the language-specific differences of these networks, is there a language-universal ontology, how much information can be shared across languages? First attempts to answer these questions are given in the form of a set of shared or common Base Concepts that has been derived from the separate wordnets and their classification by a language-neutral top-ontology. These Base Concepts play a fundamental role in several wordnets. Nevertheless, the database may also serve many practical needs with respect to (cross-language) information retrieval, machine translation tools, language generation tools and language learning tools, which are discussed in the final chapter. The book offers an excellent introduction to the EuroWordNet project for scholars in the field and raises many issues that set the directions for further research in semantics and knowledge engineering.

Multilingual Information Retrieval

Download Multilingual Information Retrieval PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 3642230083
Total Pages : 232 pages
Book Rating : 4.6/5 (422 download)

DOWNLOAD NOW!


Book Synopsis Multilingual Information Retrieval by : Carol Peters

Download or read book Multilingual Information Retrieval written by Carol Peters and published by Springer Science & Business Media. This book was released on 2012-01-05 with total page 232 pages. Available in PDF, EPUB and Kindle. Book excerpt: We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings. Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations. The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many ‘hands-on details’ that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.

Accessing Multilingual Information Repositories

Download Accessing Multilingual Information Repositories PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3540457003
Total Pages : 1032 pages
Book Rating : 4.5/5 (44 download)

DOWNLOAD NOW!


Book Synopsis Accessing Multilingual Information Repositories by : Fredric Gey

Download or read book Accessing Multilingual Information Repositories written by Fredric Gey and published by Springer. This book was released on 2006-10-15 with total page 1032 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the thoroughly refereed postproceedings of the 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005. The book presents 111 revised papers together with an introduction. Topical sections include multilingual textual document retrieval, cross-language and more, monolingual experiments, domain-specific information retrieval, interactive cross-language information retrieval, multiple language question answering, cross-language retrieval in image collections, cross-language speech retrieval, multilingual Web track, cross-language geographical retrieval, and evaluation issues.

Parallel Corpora for Contrastive and Translation Studies

Download Parallel Corpora for Contrastive and Translation Studies PDF Online Free

Author :
Publisher : John Benjamins Publishing Company
ISBN 13 : 9027262845
Total Pages : 313 pages
Book Rating : 4.0/5 (272 download)

DOWNLOAD NOW!


Book Synopsis Parallel Corpora for Contrastive and Translation Studies by : Irene Doval

Download or read book Parallel Corpora for Contrastive and Translation Studies written by Irene Doval and published by John Benjamins Publishing Company. This book was released on 2019-03-20 with total page 313 pages. Available in PDF, EPUB and Kindle. Book excerpt: This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.

Comparable Corpora and Computer-assisted Translation

Download Comparable Corpora and Computer-assisted Translation PDF Online Free

Author :
Publisher : John Wiley & Sons
ISBN 13 : 1119002702
Total Pages : 221 pages
Book Rating : 4.1/5 (19 download)

DOWNLOAD NOW!


Book Synopsis Comparable Corpora and Computer-assisted Translation by : Estelle Maryline Delpech

Download or read book Comparable Corpora and Computer-assisted Translation written by Estelle Maryline Delpech and published by John Wiley & Sons. This book was released on 2014-07-22 with total page 221 pages. Available in PDF, EPUB and Kindle. Book excerpt: Computer-assisted translation (CAT) has always used translation memories, which require the translator to have a corpus of previous translations that the CAT software can use to generate bilingual lexicons. This can be problematic when the translator does not have such a corpus, for instance, when the text belongs to an emerging field. To solve this issue, CAT research has looked into the leveraging of comparable corpora, i.e. a set of texts, in two or more languages, which deal with the same topic but are not translations of one another. This work had two primary objectives. The first is to assess the input of lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify bilingual-lexicon-extraction methods which best match the translators' needs, determining the current limits of these techniques and suggesting improvements. The author focuses, in particular, on the identification of fertile translations, the management of multiple morphological structures, and the ranking of candidate translations. The experiments are carried out on two language pairs (English–French and English–German) and on specialized texts dealing with breast cancer. This research puts significant emphasis on applicability – methodological choices are guided by the needs of the final users. This book is organized in two parts: the first part presents the applicative and scientific context of the research, and the second part is given over to efforts to improve compositional translation. The research work presented in this book received the PhD Thesis award 2014 from the French association for natural language processing (ATALA).

Using Comparable Corpora for Under-Resourced Areas of Machine Translation

Download Using Comparable Corpora for Under-Resourced Areas of Machine Translation PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3319990047
Total Pages : 326 pages
Book Rating : 4.3/5 (199 download)

DOWNLOAD NOW!


Book Synopsis Using Comparable Corpora for Under-Resourced Areas of Machine Translation by : Inguna Skadiņa

Download or read book Using Comparable Corpora for Under-Resourced Areas of Machine Translation written by Inguna Skadiņa and published by Springer. This book was released on 2019-02-06 with total page 326 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.

Advances in Cross-Language Information Retrieval

Download Advances in Cross-Language Information Retrieval PDF Online Free

Author :
Publisher : Springer Science & Business Media
ISBN 13 : 3540408304
Total Pages : 832 pages
Book Rating : 4.5/5 (44 download)

DOWNLOAD NOW!


Book Synopsis Advances in Cross-Language Information Retrieval by : Cross-Language Evaluation Forum. Workshop

Download or read book Advances in Cross-Language Information Retrieval written by Cross-Language Evaluation Forum. Workshop and published by Springer Science & Business Media. This book was released on 2003-10-10 with total page 832 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the thoroughly refereed post-proceedings of a workshop by the Cross-Language Evaluation Forum Campaign, CLEF 2002, held in Rome, Italy in September 2002. The 43 revised full papers presented together with an introduction and run data in an appendix were carefully reviewed and revised upon presentation at the workshop. The papers are organized in topical sections on systems evaluation experiments, cross language and more, monolingual experiments, mainly domain-specific information retrieval, interactive issues, cross-language spoken document retrieval, and cross-language evaluation issues and initiatives.

Evaluation of Multilingual and Multi-modal Information Retrieval

Download Evaluation of Multilingual and Multi-modal Information Retrieval PDF Online Free

Author :
Publisher : Springer
ISBN 13 : 3540749993
Total Pages : 1019 pages
Book Rating : 4.5/5 (47 download)

DOWNLOAD NOW!


Book Synopsis Evaluation of Multilingual and Multi-modal Information Retrieval by : Paul Clough

Download or read book Evaluation of Multilingual and Multi-modal Information Retrieval written by Paul Clough and published by Springer. This book was released on 2007-09-04 with total page 1019 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the thoroughly refereed postproceedings of the 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, held in Alicante, Spain, September 2006. The revised papers presented together with an introduction were carefully reviewed and selected for inclusion in the book. The papers are organized in topical sections on Multilingual Textual Document Retrieval, Domain-Specifig Information Retrieval, i-CLEF, QA@CLEF, ImageCLEF, CLSR, WebCLEF and GeoCLEF.

Comparable Corpora in Cross-language Information Retrieval

Download Comparable Corpora in Cross-language Information Retrieval PDF Online Free

Author :
Publisher :
ISBN 13 :
Total Pages : 125 pages
Book Rating : 4.:/5 (536 download)

DOWNLOAD NOW!


Book Synopsis Comparable Corpora in Cross-language Information Retrieval by : Tuomas Talvensaari

Download or read book Comparable Corpora in Cross-language Information Retrieval written by Tuomas Talvensaari and published by . This book was released on 2008 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Cross-language information retrieval (CLIR) enables users to express queries in a language different from the language of the documents to be retrieved. For example, a Finnish-speaking person could pose a query to a CLIR system in Finnish (the source language) to retrieve documents written in English (the target language). The language barrier is usually crossed by translating the query into the target language, after which the documents can be retrieved with the methods of monolingual information retrieval (IR). Aligned text collections (corpora) are common query translation resources in CLIR. A parallel corpus is a collection where texts in one language are aligned with their translations in another language. The aligned texts of a comparable corpus are more loosely related. They are not translations, but share topics and include common vocabulary in the two languages. Both kinds of corpora can be used to train statistical translation models, but parallel corpora are preferred because more dependable translation knowledge can be derived from them. However, parallel corpora do not exist for all language pairs and domains. Hence, it is sometimes necessary to resort to noisier comparable corpora. This thesis proposes new methods for the acquisition, alignment, and employment of comparable corpora. The acquisition method is based on language-aware focused web crawling, where web content written in specific languages and discussing specific topics of interest is obtained by employing the hyperlink structure of the web. In the alignment phase, the source language documents are used as CLIR queries to retrieve target language documents. The similarity of the query to the documents, and various other factors, are used as evidence to form alignments between the source and target language documents. The constructed corpora were employed in query translation as a cross-language similarity thesaurus, a structure where target language words are ranked based on their similarity with a source language word that is given as input. The highest ranking words are assumed to be either translations of the input word or related to it in some other manner. The methods were evaluated with extensive IR experiments that covered different language pairs, domains, and test data. The proposed CLIR approach was combined with approaches based on bilingual dictionaries. The combined approaches outperformed pure dictionary-based translation. In addition, the comparable corpus translation performed better in domain-specific CLIR than translation utilizing high-quality parallel corpora. This suggests that the proposed methods are particularly useful in domains where CLIR resources are scarce."