Skip to Content.
Sympa Menu

nl-uiuc - [nl-uiuc] Upcoming talk at the AIIS seminar

nl-uiuc AT lists.cs.illinois.edu

Subject: Natural language research announcements

List archive

[nl-uiuc] Upcoming talk at the AIIS seminar


Chronological Thread 
  • From: Ming-Wei Chang <mchang21 AT uiuc.edu>
  • To: nl-uiuc AT cs.uiuc.edu, nl-uiuc AT cs.uiuc.edu, aivr AT cs.uiuc.edu, dais AT cs.uiuc.edu, cogcomp AT cs.uiuc.edu, vision AT cs.uiuc.edu, krr-group AT cs.uiuc.edu, aiis AT cs.uiuc.edu
  • Subject: [nl-uiuc] Upcoming talk at the AIIS seminar
  • Date: Fri, 13 Feb 2009 16:04:15 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/nl-uiuc>
  • List-id: Natural language research announcements <nl-uiuc.cs.uiuc.edu>


Dear faculty and students,

Dr. Douglas Downey will give a talk (details below) for the AIIS seminar
at 4:00 pm, Feb 19th (next Thursday). The room number is 3405. If you
would like to meet with Douglas, please let me know.

Thank you,
Ming-Wei


Title: Autonomous Web-scale Information Extraction

Abstract:
Search engines are extremely useful tools for answering questions. However,
a significant number of questions users might pose -- for example, "which
nanotechnology companies are hiring on the West Coast?" -- cannot be
addressed using existing search engines, because the answers do not lie on a
single page. To answer these kinds of queries, users must extract and
synthesize information from multiple documents. Currently, this is a
tedious and error-prone manual process.

In this talk, I will describe my research aimed at automating the extraction
of this information from the Web. I will present a model of the redundancy
inherent in the Web, and show that the model can be used to identify correct
extractions autonomously, without the manually labeled examples typically
assumed in previous information extraction research. However, the model has
limited efficacy for the "long tail" of infrequently mentioned facts; I
demonstrate how unsupervised language models can be leveraged in concert
with redundancy to overcome this limitation. Lastly, I will describe recent
theoretical and experimental results illustrating that a generalization of
the redundancy-based approach is effective for a variety of textual
classification tasks, beyond information extraction.

Bio:
Doug Downey is an assistant professor in the EECS Department of Northwestern
University, which he joined in the Fall of 2008. He obtained his PhD from
the University of Washington, where he was advised by Oren Etzioni and
supported by an NSF Fellowship and Microsoft Research Graduate Fellowship.
His research interests are in the areas of natural language processing,
machine learning, and artificial intelligence. At UW, he was part of the
KnowItAll project, which was aimed at utilizing the Web to autonomously
extract large knowledge bases. Doug's primary research results concern
probabilistic models of the redundancy inherent in large corpora, along with
associated techniques that allow systems like KnowItAll to extract data
autonomously at high precision.








Archive powered by MHonArc 2.6.16.

Top of Page