Skip to Content.
Sympa Menu

nl-uiuc - [nl-uiuc] FW: Data Science Summer Institute Talk, Ed Hovy, July 8th at 1:30

nl-uiuc AT lists.cs.illinois.edu

Subject: Natural language research announcements

List archive

[nl-uiuc] FW: Data Science Summer Institute Talk, Ed Hovy, July 8th at 1:30


Chronological Thread 
  • From: "Roth, Dan" <danr AT uiuc.edu>
  • To: "cogcomp AT cs.uiuc.edu" <cogcomp AT cs.uiuc.edu>, "nl-uiuc AT cs.uiuc.edu" <nl-uiuc AT cs.uiuc.edu>, "dais AT cs.uiuc.edu" <dais AT cs.uiuc.edu>
  • Subject: [nl-uiuc] FW: Data Science Summer Institute Talk, Ed Hovy, July 8th at 1:30
  • Date: Thu, 3 Jul 2008 16:04:59 -0500
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/nl-uiuc>
  • List-id: Natural language research announcements <nl-uiuc.cs.uiuc.edu>

 

 


From: Schaefer, Melinda M [mailto:mschaefr AT cs.uiuc.edu]
Sent: Thursday, July 03, 2008 4:01 PM
To: ifaculty AT cs.uiuc.edu; cs-grads AT cs.uiuc.edu
Cc: mschaefr AT uiuc.edu; King, Robin Brian
Subject: Data Science Summer Institute Talk, Ed Hovy, July 8th at 1:30

 

University of Illinois at Urbana-Champaign

Department of Computer Science

The Thomas M. Siebel Center for Computer Science

201 North Goodwin Avenue

Urbana, Illinois 61801-2302  USA

 

 

Data Science Summer Institute Talk

 

 

The Promise and Problems of Annotation

 

 

Ed Hovy, Director

Center for Knowledge Integration and Discovery (CKID)

USC 

Tuesday, July 8, 2008 at 1:30 P.M.

3405 Siebel Center for Computer Science

 

 

 

Abstract:

In order to apply automated language processing technology to assist humans with analysis and other text-oriented tasks such as retrieval, summarization, question answering, and translation, the technology has to be ‘trained’ to the particulars of the domain and the analysis task(s).  Different fields of study, different tasks, different text genres, and different domains of interest all present different, and sometimes unique, challenges. 

 

The procedure of ‘training’ the technology involves preparing a selection of the representative texts to create what is called the training suite.  Typically, domain experts view the texts with suitable interfaces and in various ways and formats enter information they find useful for their task(s), in a process called coding or annotation.  Usually, annotation includes the steps of delimiting some fragment of text, selecting one or more interpretive labels to attach to that portion, and perhaps adding additional information.  Once two or more annotators have performed coding on the same texts, and have achieved a high enough degree of agreement between them, the language processing technology can be trained on a portion of the training suite, and its performance measured on the remainder.  If that is satisfactory, the technology can be applied to additional, unannotated, material of the same type, thereby assisting analysts in future tasks. 

 

Annotation is not an exact science.  To help ensure clean and trustable annotations suitable for machine learning, the language processing community is beginning to address a set of seven issues.  Using examples from several of the author’s projects, this talk describes each issue, lists some relevant work for each, and points to what needs to be resolved.  The seven issues are: 1. How does one obtain a balanced corpus to annotate, and when is a corpus balanced (and representative)? 2. How does one decide what specifically to annotate?  How does one adequately capture the theory behind the phenomena and express it in simple annotation instructions? 3. When hiring annotators, what characteristics are important?  How does one ensure that they are adequately (and not over- or under-) trained?  4. How does one establish a simple, fast, and trustworthy annotation procedure?  What interfaces does one build?  How does one ensure that the interfaces do not influence the annotation results?  5. How does evaluate the results?  What are the appropriate agreement measures?  At which cutoff points should one redesign or re-do the annotations?  6. Hoe should one formulate and store the results?  How does one ensure compatibility with other existing resources?  How does one make results available for best impact?  7. How does one report the annotation effort and results?  How does one actually publish papers on this work?  What should the papers contain?

 

 

Bio:

 

Eduard Hovy directs the DHS Center for Knowledge Integration and Discovery at the University of Southern California, where he also leads the Natural Language Research Group at USC’s Information Sciences Institute and serves as Deputy Director of the Intelligent Systems Division and as research associate professor of the Computer Science Department.  He completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987.  His research focuses on information extraction, automated text summarization, the semi-automated construction of large lexicons and ontologies, machine translation, question answering, and digital government.  Dr. Hovy regularly serves in an advisory capacity to funders of NLP research in the US and EU.  He is the author or co-editor of five books and over 180 technical articles.  In 2001 Dr. Hovy served as President of the Association for Computational Linguistics (ACL) and in 2001–03 as President of the International Association of Machine Translation (IAMT); he currently serves as President of the Digital Government Society of North America (DGSNA). Dr. Hovy regularly co-teaches a course in the Master’s Degree Program in Computer Science at the University of Southern California, as well as occasional short courses on MT and other topics at universities and conferences.  He has served on the Ph.D. and M.S. committees for students from USC, Carnegie Mellon University, Taiwan National U, the Universities of Toronto, Karlsruhe, Pennsylvania, Stockholm, Waterloo, Nijmegen, Pretoria, and Ho Chi Minh City.

 

URLs:

http://www.isi.edu/natural-language/nlp-at-isi.html

http://www.isi.edu/~hovy.html

 

 

Melinda Schaefer

Department of Computer Science

University of Illinois, Urbana/Champaign

201 N. Goodwin Ave

2232 Siebel Center, MC-258

Urbana, IL 61801

(217)333-6454

mschaefr AT uiuc.edu

 



  • [nl-uiuc] FW: Data Science Summer Institute Talk, Ed Hovy, July 8th at 1:30, Roth, Dan, 07/03/2008

Archive powered by MHonArc 2.6.16.

Top of Page