Skip to Content.
Sympa Menu

nl-uiuc - [nl-uiuc] LDC corpora

nl-uiuc AT lists.cs.illinois.edu

Subject: Natural language research announcements

List archive

[nl-uiuc] LDC corpora


Chronological Thread 
  • From: "Fleck, Margaret M" <mfleck AT cs.uiuc.edu>
  • To: "nl-uiuc AT cs.uiuc.edu" <nl-uiuc AT cs.uiuc.edu>
  • Subject: [nl-uiuc] LDC corpora
  • Date: Mon, 19 May 2008 14:20:18 -0500
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/nl-uiuc>
  • List-id: Natural language research announcements <nl-uiuc.cs.uiuc.edu>


UIUC has licenses for a lot of datasets from the Linguistic Data Consortium,
but
they are scattered all over campus, largely without remote access. So
getting
a copy of a corpus we own often requires tracking down who has the disks and
hiking across campus to borrow them. Much worse, it's not uncommon to
discover
that the disks were sent to someone that no one has ever heard of (e.g. maybe
a grad student or an administrative assistant from the 1990's) , so you'll
need to pay a extra-copy charge (often non-trivial) to get the data.

We're trying to see if we can cook up a better solution, e.g. put a copy of
all
the data onto a server with web access. To do this, we need to track down
as much of this data as possible.

The people with big caches of LDC data include Dan Roth, Richard Sproat,
ChenXiang Zhai, Mark Hasegawa-Johnson, and myself. A list of the corpora
I know about is at:

http://loris.cs.uiuc.edu/ldc-corpora.html

If any of the rest of you have LDC datasets that aren't on this list, could
you please email me?

Also please tell me if you know of other groups that might have LDC data
but aren't on this mailing list.

Many thanks,

Margaret








  • [nl-uiuc] LDC corpora, Fleck, Margaret M, 05/19/2008

Archive powered by MHonArc 2.6.16.

Top of Page