Skip to Content.
Sympa Menu

illinois-ml-nlp-users - Re: [Illinois-ml-nlp-users] Running the LBJ NER Tagger

illinois-ml-nlp-users AT lists.cs.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

Re: [Illinois-ml-nlp-users] Running the LBJ NER Tagger


Chronological Thread 
  • From: Lev-Arie Ratinov <ratinov2 AT uiuc.edu>
  • To: Jeff Dalton <jdalton AT cs.umass.edu>
  • Cc: illinois-ml-nlp-users AT cs.uiuc.edu
  • Subject: Re: [Illinois-ml-nlp-users] Running the LBJ NER Tagger
  • Date: Fri, 18 Mar 2011 16:12:40 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users>
  • List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Hi Jeff. The first step would be trying to double-check that the
output of your system as you're running it is identical to the
intended output. Can you send me the output of my system?

The web data is attached. I may have re-annotated the data, and I've
also discovered a bug in my evaluation on the Web portion. So you'll
see a different result on the Web data. However, the result on the
CoNLL data should be around 90F1 phrase-level.



On Fri, Mar 18, 2011 at 3:30 PM, Jeff Dalton
<jdalton AT cs.umass.edu>
wrote:
> Thanks for getting back to me.  ... My responses are inline below.  With a
> bit of hacking I managed to get the ner classifier working.  Here are the
> results of the output for the eng.testb set from conll 2003.  The evaluation
> is done using conlleval.pl and reports phrase-level measures:
> processed 46435 tokens with 5645 phrases...
> Config/baselineFeatures.config
> accuracy:  93.91%; precision:  75.78%; recall:  67.12%; FB1:  71.19
> Config/allLayer1.config
> accuracy:  97.28%; precision:  88.34%; recall:  85.79%; FB1:  87.05
> Config/AllFeatures.config
> accuracy:  97.09%; precision:  88.39%; recall:  83.77%; FB1:  86.02
> These numbers look different than the ones reported in the paper.  In
> particular, the baseline and AllFeatures models performance is very
> different.  Is there anything else that I could be missing that needs to be
> done?  It would be good to try to understand what could be going wrong.
> I would also be very interested in running the tagger over the web page data
> you report on.  Is it possible to make that dataset available?
> Thanks again for the help.
> - Jeff
>
> On Thu, Mar 10, 2011 at 2:13 PM, Lev-Arie Ratinov
> <ratinov2 AT uiuc.edu>
> wrote:
>>
>> Hi Jeff.
>>
>> I've seen this error :
>> "ERROR: Can't locate NETaggerLevel1.lc in the class path."
>> before. It's an illusive error. The file  NETaggerLevel1.java
>> is generated automatically, so I believe that understanding
>> it is the wrong way to go.
>>
>> Are you using Unix/Linux or Windows. The error you're  reporting
>> is typical for the Windows systems. One of the  tricks there, is that
>> in Windows, your paths should be absolute, e.g:
>>
>>
>> D:\temp\TFGTagger\dist\TFGTagger.jar;D:\temp\TFGTagger\lib\LBJ2.jar;D:\temp\TFGTagger\lib\LBJ2Library.jar
>> EndToEndSystemTFG.TFGTagger annotate
>> D:\temp\TFGTagger\Data\SampleEmails\SampleRawMails
>> D:\temp\TFGTagger\Data\SampleEmails\SampleTaggedMails
>> D:\temp\TFGTagger\Config\TFG_ForClient_X.config
>
> I am using 64-bit Ubuntu linux 9.10 with a 64-bit Sun JDK 1.6.xxx.
> The issue is that the classifier classes have static initializer methods
> which set the lcFilePath member variable, such as the one below:
>   static
>   {
>     lcFilePath = NETypeTagger.class.getResource("NETypeTagger.lc");
>     if (lcFilePath == null)
>     {
>       System.err.println("ERROR: Can't locate NETypeTagger.lc in the class
> path.");
>       System.exit(1);
>     }
>   }
> If the lc file isn't there, then the classifier exits.  The lc file isn't in
> the distribution, so I don't see how the file in the path could ever be
> present...
> To get around this problem I removed the static initializers and set the
> lcFilePath in the constructor to the saved classifier locations.  This
> allowed them to read the saved models included in the distribution.
>
>>
>> Also, the column format I'm using is a little different from CoNLL03
>> annotation format. Below is an example, note that there is shallow
>> parse and POS info there, but I don't use it. So you can replace these
>> columns by dummy values. Sorry, I don't have a script for that. The
>> importance of the column format is that sentence boundaries are
>> marked. I have a support for "brackets format",but then you'll rely on
>> my own sentence splitting, and you won't be able to reproduce the
>> results. Here is the sample data. Please let me know if it solves your
>> problems:
>>
>>
>> O       0       0       O       -X-     -DOCSTART-      x       x       0
>>
>> O       0       0       I-NP    NNP     CRICKET x       x       0
>> O       0       1       O       :       -       x       x       0
>> B-ORG   0       2       I-NP    NNP     LEICESTERSHIRE  x       x       0
>> O       0       3       I-NP    NNP     TAKE    x       x       0
>> O       0       4       I-PP    IN      OVER    x       x       0
>> O       0       5       I-NP    NNP     AT      x       x       0
>> O       0       6       I-NP    NNP     TOP     x       x       0
>> O       0       7       I-NP    NNP     AFTER   x       x       0
>> O       0       8       I-NP    NNP     INNINGS x       x       0
>> O       0       9       I-NP    NN      VICTORY x       x       0
>> O       0       10      O       .       .       x       x       0
>>
>> B-LOC   0       0       I-NP    NNP     LONDON  x       x       0
>> O       0       1       I-NP    CD      1996-08-30      x       x       0
>>
>> B-MISC  0       0       I-NP    NNP     West    x       x       0
>> I-MISC  0       1       I-NP    NNP     Indian  x       x       0
>> O       0       2       I-NP    NN      all-rounder     x       x       0
>> B-PER   0       3       I-NP    NNP     Phil    x       x       0
>> I-PER   0       4       I-NP    NNP     Simmons x       x       0
>> O       0       5       I-VP    VBD     took    x       x       0
>> O       0       6       I-NP    CD      four    x       x       0
>> O       0       7       I-PP    IN      for     x       x       0
>> O       0       8       I-NP    CD      38      x       x       0
>> O       0       9       I-PP    IN      on      x       x       0
>> O       0       10      I-NP    NNP     Friday  x       x       0
>> O       0       11      I-PP    IN      as      x       x       0
>> B-ORG   0       12      I-NP    NNP     Leicestershire  x       x       0
>> O       0       13      I-VP    VBD     beat    x       x       0
>> B-ORG   0       14      I-NP    NNP     Somerset        x       x       0
>
> Thanks.  This helps.  I modified the reader code to read the conll format
> and pick out the necessary column values.
>
>>
>>
>>
>> On Thu, Mar 10, 2011 at 5:44 AM, Jeff Dalton
>> <jdalton AT cs.umass.edu>
>> wrote:
>> > I'm a PhD student at UMass Amherst in the CIIR.  I am trying to run the
>> > UIUC
>> > NER tagger for a project I am working on.  I downloaded the the
>> > distribution
>> > from the website.  However, when I try to run it, I get the error:
>> >  "ERROR:
>> > Can't locate NETaggerLevel1.lc in the class path."  I cannot locate the
>> > specified file in the distribution.  It looks like the output of the
>> > saved
>> > classifier instance.   From the code in NETaggerLevel1.java, it is not
>> > clear
>> > what the appropriate seeting for lcFilePath is, or how I should create
>> > it.
>> >  I assume it is created as part of the training process. I tried to run
>> > the
>> > training command, but it fails in the same location. Could you perhaps
>> > shed
>> > some light on this mystery?
>> > Also, the parser appears to be loading data in Reuters format.  I have
>> > the
>> > conll data and the data format appears to differ.  Are there scripts to
>> > convert between the formats?  Perhaps I am missing a bit of
>> > documentation on
>> > training.  I'd like to try and reproduce the conll results.
>> > I would appreciate any help you could give.
>> > Cheers,
>> > - Jeff
>> > _______________________________________________
>> > illinois-ml-nlp-users mailing list
>> > illinois-ml-nlp-users AT cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/illinois-ml-nlp-users
>> >
>> >
>>
>>
>>
>> --
>> Peace&Love
>
>



--
Peace&Love

Attachment: 1.columns.gold
Description: Binary data

Attachment: 2.columns.gold
Description: Binary data

Attachment: 3.columns.gold
Description: Binary data

Attachment: 4.columns.gold
Description: Binary data

Attachment: 5.columns.gold
Description: Binary data

Attachment: 6.columns.gold
Description: Binary data

Attachment: 7.columns.gold
Description: Binary data

Attachment: 8.columns.gold
Description: Binary data

Attachment: 9.columns.gold
Description: Binary data

Attachment: 10.columns.gold
Description: Binary data

Attachment: 11.columns.gold
Description: Binary data

Attachment: 12.columns.gold
Description: Binary data

Attachment: 13.columns.gold
Description: Binary data

Attachment: 14.columns.gold
Description: Binary data

Attachment: 15.columns.gold
Description: Binary data

Attachment: 16.columns.gold
Description: Binary data

Attachment: 17.columns.gold
Description: Binary data

Attachment: 18.columns.gold
Description: Binary data

Attachment: 19.columns.gold
Description: Binary data

Attachment: 20.columns.gold
Description: Binary data




Archive powered by MHonArc 2.6.16.

Top of Page