Skip to Content.
Sympa Menu

nl-uiuc - [nl-uiuc] Talk by Jason Baldridge at 2 pm, 3405 SC.

nl-uiuc AT lists.cs.illinois.edu

Subject: Natural language research announcements

List archive

[nl-uiuc] Talk by Jason Baldridge at 2 pm, 3405 SC.


Chronological Thread 
  • From: Rajhans Samdani <rsamdan2 AT illinois.edu>
  • To: nl-uiuc AT cs.uiuc.edu, aivr AT cs.uiuc.edu, dais AT cs.uiuc.edu, cogcomp AT cs.uiuc.edu, vision AT cs.uiuc.edu, eyal AT cs.uiuc.edu, aiis AT cs.uiuc.edu, aistudents AT cs.uiuc.edu
  • Subject: [nl-uiuc] Talk by Jason Baldridge at 2 pm, 3405 SC.
  • Date: Fri, 30 Apr 2010 12:05:11 -0500 (CDT)
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/nl-uiuc>
  • List-id: Natural language research announcements <nl-uiuc.cs.uiuc.edu>

Hi All,

This is a gentle reminder for today's talk by Jason Balridge
(http://comp.ling.utexas.edu/people/jason_baldridge). He is going to give a
talk in the AIIS seminar (April 30, 2pm, 3405 SC).

Here are title, abstract, and bio.

Title:
Using universal grammar and integer programming to improve
weakly-supervised supertaggers.

Abstract:
The last decade has seen a great deal of work in computational
linguistics on or using categorial grammar, especially Combinatory
Categorial Grammar (CCG). These efforts include wide-coverage
grammars/parsers based on CCGbank and domain-specific grammars developed with
OpenCCG. A recurring theme in this work is that identifying the correct
lexical categories for words (or pieces of
logical forms) is highly useful, whether it is for grammar development,
supertagging to speed up parsing, hypertagging for
sentence realization, or using categories as the basis for features in
various other tasks, such as machine translation. However, building
models for labeling categories requires training material, which so
far has meant using a resource such as CCGbank which has texts labeled with
categories and derivations. In the context of OpenCCG, there is the problem
of assigning categories to words that are outside of the grammar. This
reliance on labor-intensive resources or effort limits the cross-linguistic
applicability of categorial grammar for work in computational linguistics, so
it would naturally be of interest to find ways to bootstrap at least some of
the information.

In this talk, I'll discuss experiments on weakly supervised supertagging
using Hidden Markov Models as a means for expanding
categorial lexicons. Applied naively to supertagging for CCGbank, HMMs
perform quite poorly when given only a tag dictionary and standard EM
training; I'll discuss two complementary strategies for improving the learned
HMM that use no additional annotations or knowledge about the language or
dataset being analyzed. The first is to use knowledge about the universal
grammar of category combination to create a grammar-informed initialization
for transition probabilities before starting EM. The second is to use an
integer program that finds the smallest set of supertag bigrams that covers
the text while obeying the constraints of the tag dictionary. Both strategies
provide massive gains over standard, randomly initialized EM, for both
English and Italian supertagging. In combination, they deliver further error
reductions. The computational complexity of the integer program in the face
of supertag ambiguity is very high, so we employ a two-stage method
that---while not guaranteed!
t!
o find the optimal solution---works very well in practice.

[This talk describes joint work with Sujith Ravi and Kevin Knight.]

Bio:

Jason Baldridge is an assistant professor in the Department of
Linguistics at the University of Texas at Austin. He received his
Ph.D. from the University of Edinburgh in 2002 and was then a
post-doctoral researcher there on the ROSIE project until 2005. His
main research interests include categorial grammars, active learning,
discourse structure, coreference resolution, and georeferencing. He is
one of the co-creators of OpenNLP and has been active for many years in the
creation and promotion of open source software for natural language
processing.

Regards,
Rajhans


Rajhans Samdani,
Graduate Student,
Dept. of Computer Science,
University of Illinois at Urbana-Champaign.



  • [nl-uiuc] Talk by Jason Baldridge at 2 pm, 3405 SC., Rajhans Samdani, 04/30/2010

Archive powered by MHonArc 2.6.16.

Top of Page