Skip to Content.
Sympa Menu

nl-uiuc - [nl-uiuc] AIIS/DAIS Talk on Friday, Dec 9 by Rajeev Rastogi

nl-uiuc AT lists.cs.illinois.edu

Subject: Natural language research announcements

List archive

[nl-uiuc] AIIS/DAIS Talk on Friday, Dec 9 by Rajeev Rastogi


Chronological Thread 
  • From: "Samdani, Rajhans" <rsamdan2 AT illinois.edu>
  • To: "aiis AT cs.uiuc.edu" <aiis AT cs.uiuc.edu>, "aistudents AT cs.uiuc.edu" <aistudents AT cs.uiuc.edu>, "nl-uiuc AT cs.uiuc.edu" <nl-uiuc AT cs.uiuc.edu>, "vision AT cs.uiuc.edu" <vision AT cs.uiuc.edu>, "aivr AT cs.uiuc.edu" <aivr AT cs.uiuc.edu>, "eyal AT cs.uiuc.edu" <eyal AT cs.uiuc.edu>
  • Subject: [nl-uiuc] AIIS/DAIS Talk on Friday, Dec 9 by Rajeev Rastogi
  • Date: Wed, 7 Dec 2011 16:40:15 +0000
  • Accept-language: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/nl-uiuc>
  • List-id: Natural language research announcements <nl-uiuc.cs.uiuc.edu>

Hi all!

This week we have a special AIIS/DAIS joint seminar. We're hosting Rajeev
Rastogi from Yahoo! Research. Following are the details.

When: Friday, Dec 9, 1.30 pm (**not 2 pm**)

Where: 2405 Siebel Center (**not 3405*)

Title: Building knowledge bases from the web
Abstract:
The web is a vast repository of human knowledge. Extracting structured data
from web pages can enable applications like comparison shopping, and lead to
improved ranking and rendering of search results. In this talk, I will
describe two efforts at Yahoo! Labs to extract records from pages at web
scale. The first is a wrapper induction system that handles end-to-end
extraction tasks from clustering web pages to learning XPath extraction rules
to relearning rules when sites change. The system has been deployed in
production within Yahoo! to extract more than 200 million records from ~200
web sites. The second effort exploits content redundancy on the web to
automatically extract records without human supervision. Starting with a seed
database, we determine values in the pages of each new site that match
attribute values in the seed records. We devise a new notion of similarity
for matching templatized attribute content, and an apriori style algorithm
that exploits templatized page structure to prune spurious attribute matches.

Bio:Previously Rajeev was a Bell Labs Fellow and the founding Director of the
Bell Labs Research Center in Bangalore, India. Rajeev worked at Bell Labs
from 1993 until 2008. During the period, he led a number of research projects
that were incorporated into Lucent products and services. These include the
Datablitz main-memory database system, the Fellini multimedia storage server,
and the NetInventory auto-discovery engine. His research interests include
database systems, data mining, and network management. His most recent
research has focused on the areas of network monitoring and security, network
graph compression and analysis, and video content dissemination.

See you!
Rajhans



  • [nl-uiuc] AIIS/DAIS Talk on Friday, Dec 9 by Rajeev Rastogi, Samdani, Rajhans, 12/07/2011

Archive powered by MHonArc 2.6.16.

Top of Page