nl-uiuc AT lists.cs.illinois.edu

Subject: Natural language research announcements

List archive

[nl-uiuc] Machine translation tutorial workshop on Thursday, March 5th

From: "Schwartz, Lane Oscar Bingaman" <lanes AT illinois.edu>
To: "nl-uiuc AT cs.uiuc.edu" <nl-uiuc AT cs.uiuc.edu>
Subject: [nl-uiuc] Machine translation tutorial workshop on Thursday, March 5th
Date: Tue, 3 Mar 2015 17:07:46 +0000
Accept-language: en-US
List-archive: <http://lists.cs.uiuc.edu/pipermail/nl-uiuc/>
List-id: Natural language research announcements <nl-uiuc.cs.uiuc.edu>

Want to access sources in languages other than English? Want to know about the best tools for doing this? Want to know about the best ways to use these tools? Join us for a tutorial and discussion at our workshop on using Machine Translation for academic research.

Thursday, March 5th, 3:30-5:00pm

289 Undergraduate Library (computer lab)

Using Machine Translation Effectively for Academic Research

Tutorial and discussion lead by Professor Lane Schwartz and Ph.D. Candidate Daniel Ross

While the majority of contemporary scholarship is authored and published using the English language, the use of English as the language of the academy is not universal, and this is especially relevant for historical publications. An important component of responsible scholarship is thorough review of the relevant literature, even when some of that literature is written in a foreign language. In such cases, not knowing a language can cause difficulty; for this reason, research in English is more often consulted and cited than research in other languages, meaning that significant contributions might be overlooked or remain inaccessible. Modern machine translation technology, when used appropriately, can substantially enable researchers to consult sources in languages they could not otherwise read, and can make the process more efficient for languages with which they have limited familiarity. We will also emphasize the impact of expertise in a certain domain that all researchers have, which allows for successful post-editing of Machine Translation output, even when the language is not familiar to the researcher.

The primary technologies we will discuss are: Machine Translation (MT) and Optical Character Recognition (OCR), which allows image-to-text conversion, making paper copies of books and journal articles accessible through MT. For MT, we will give demonstrations using Google Translate and discuss alternatives. For OCR, we will give demonstrations using Adobe Acrobat, and list alternatives. In fact, it is possible to scan a paper copy of a research article in, for example, Chinese, use these methods and end up with intelligible output that will at least let a researcher skim the article’s content. In the end, it may be necessary to consult a speaker of the language, but the use of MT will eliminate that step entirely in some cases and at the very least facilitate the process of finding sources. We will give live demonstrations of how this process works including full demonstrations of (1) creating an interlinear gloss for difficult text; and (2) translating a full scanned article, preserving the formatting.

This tutorial will not eliminate the language barrier, but we will show attendees how to venture across it to improve their own research potential and connect with the global research community. We will show how to access material in other languages effectively and efficiently. We also hope then that some attendees will take and share the presented methods with colleagues and students. There will be time for questions and discussion.

Instructors:

Dr. Lane Schwartz lanes AT illinois.edu
University of Illinois at Urbana-Champaign, Assistant Professor of Linguistics

http://dowobeha.github.io/Welcome.html

Lane Schwartz works at the intersection of human and machine translation. His research includes work in statistical machine translation, computer-aided translation, and cognitively motivated language models. He also teaches courses on Machine Translation and has published on the topic of effective post-editing by domain experts, showing that someone with expertise in a subject can, without knowing the source language, use machine translation to generate usable and intelligible results by improving the output with their expertise. He is one of the original developers of Joshua, an open source toolkit for tree-based statistical machine translation, and is a frequent contributor to Moses, the de-facto standard for phrase-based statistical machine translation.

Daniel Ross djross3 AT illinois.edu
University of Illinois at Urbana-Champaign, Ph.D. Candidate

http://publish.illinois.edu/djross3/

Daniel Ross studies theoretical linguistics while maintaining a strong base in cross-linguistic typology. His research centers around the relationship between Morphosyntax and Semantics, with additional research interests in Language Acquisition, Psycholinguistics and Computational Linguistics. He teaches a class on Historical Linguistics. He is a native speaker of English and is proficient in Spanish, and in total he has studied 20 languages at the university level (to varying degrees of familiarity): Spanish, Italian, German, Latin, Arabic, Japanese, Portuguese, Hindi, French, Swahili, Catalan, Swedish, Faroese, Quechua, Russian, American Sign Language, Basque, Turkish, Modern Greek and Mandarin Chinese. His research requires consultation of research in many languages, and the bibliography for his dissertation includes sources in over 30 languages, which were made accessible through these methods.

[nl-uiuc] Machine translation tutorial workshop on Thursday, March 5th, Schwartz, Lane Oscar Bingaman, 03/03/2015