Skip to Content.
Sympa Menu

illinois-ml-nlp-users - [Illinois-ml-nlp-users] NER tagger not preserving line breaks

illinois-ml-nlp-users AT lists.cs.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

[Illinois-ml-nlp-users] NER tagger not preserving line breaks


Chronological Thread 
  • From: Greg Durrett <gdurrett AT eecs.berkeley.edu>
  • To: illinois-ml-nlp-users AT cs.uiuc.edu
  • Subject: [Illinois-ml-nlp-users] NER tagger not preserving line breaks
  • Date: Sat, 12 Apr 2014 16:10:41 -0700
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users/>
  • List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Hi,

I'm trying to use the NER tagger:
http://cogcomp.cs.illinois.edu/page/software_view/4

I want to pass it data that has been pre-tokenized and sentence split. However, I can't seem to figure out how to get it to respect line breaks. I've attached the input (.sentences; the same thing happens with just one line break between each) and output (.tagged) when I run with the given config. The config is basically the default ontonotes file with forceNewSentenceOnLineBreaks set to true and additionally pathToTokenNormalizationData set to false (since I thought this might cause retokenization/re sentence splitting).

In the log, the tool states that one of the parameters is
keepOriginalFileTokenizationAndSentenceSplitting=false
which seems bad but the system doesn't seem to accept this as an argument?

Any advice?

Thanks!

Greg

Attachment: conll-2012-dev-short.sentences
Description: Binary data

Attachment: conll-2012-dev-short.tagged
Description: Binary data

Attachment: greg.config
Description: Binary data



  • [Illinois-ml-nlp-users] NER tagger not preserving line breaks, Greg Durrett, 04/12/2014

Archive powered by MHonArc 2.6.16.

Top of Page