Skip to Content.
Sympa Menu

illinois-ml-nlp-users - [Illinois-ml-nlp-users] output encoding for NER

illinois-ml-nlp-users AT lists.cs.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

[Illinois-ml-nlp-users] output encoding for NER


Chronological Thread 
  • From: Xuchen Yao <xuchen AT cs.jhu.edu>
  • To: illinois-ml-nlp-users AT cs.uiuc.edu
  • Cc: ratinov2 AT uiuc.dot.edu
  • Subject: [Illinois-ml-nlp-users] output encoding for NER
  • Date: Tue, 13 Nov 2012 17:08:19 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users/>
  • List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Hi,

Thanks for providing the Named Entity Tagger tool. I've been using both the server (NerServer) and batch (NerTagger) version on my local machine and am quite confused by the encoding. Suppose I need to tag the following sentence:

The studio changed hands several times after 1953 and was home to such early television series as '' Superman '' and '' The Red Skelton Show .

When running NerServer, and setting taggingEncodingScheme to BILOU, I have (by printing out String[] predictions in tagData() in NETagPlain.java):

O O O O O O O B-DATE O O O O O O O O O O O O O O O I-WORK_OF_ART I-WORK_OF_ART I-WORK_OF_ART O
The studio changed hands several times after [DATE 1953 ] and was home to such early television series as '' Superman '' and '' The [WORK_OF_ART Red Skelton Show ] .

When running NerServer, and setting taggingEncodingScheme to BIO, I have instead:

O O O O O O O U-DATE O O O O O O O O O O O O O O O I-WORK_OF_ART I-WORK_OF_ART L-WORK_OF_ART O
The studio changed hands several times after 1953 and was home to such early television series as '' Superman '' and '' The [WORK_OF_ART Red Skelton Show ] .

Here are three things confusing me. May I have some clarification please:

1. The BIO setting is actually outputting the encoding scheme of BILOU, and vice versa. I doubled checked my setting and am pretty sure I didn't have the setting wrongly swapped.

2. In both output, I-WORK_OF_ART is not preceded by a B-WORK_OF_ART. Note that if you run the batch version (NerTagger) to tag this sentence, you will get B-WORK_OF_ART in front of I-WORK_OF_ART, but there are other sentences where the batch version would output an "I-*" tag right after an "O" tag.

3. Here's the NerTagger version of the same input, with BILOU taggingEncodingScheme:
O O O O O O O B-DATE O O O O O O O O O O O O O O B-WORK_OF_ART I-WORK_OF_ART I-WORK_OF_ART I-WORK_OF_ART O

Note that it correctly outputs "B" in front of "I". However, my run of NerTagger and NerServer used the same config file (IllinoisNerExtended-v2.1/DemoConfig/ner.ontonotes.config), why the difference? Is there any suggestion on the configuration in the batch vs. server settings?

Thanks.

Xuchen Yao



  • [Illinois-ml-nlp-users] output encoding for NER, Xuchen Yao, 11/13/2012

Archive powered by MHonArc 2.6.16.

Top of Page