illinois-ml-nlp-users AT lists.cs.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

Re: [Illinois-ml-nlp-users] Question on LBJNER: Generating Wiki gazetteers

From: Lev-Arie Ratinov <arie.ratinov AT gmail.com>
To: DongHyun Choi <cdh4696 AT gmail.com>
Cc: illinois-ml-nlp-users AT cs.uiuc.edu
Subject: Re: [Illinois-ml-nlp-users] Question on LBJNER: Generating Wiki gazetteers
Date: Wed, 3 Aug 2011 22:40:21 -0500
List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users>
List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Hi DongHyun

The performance difference should not be affected by the order the
gazetteers are read. I have some random number generators in my
training code, and perhaps various versions of Java generate random
numbers differently.

In any case, differences of 0.05% are not statistically significant,
and I don't want to bother with this.

Peace&Love

On Mon, Aug 1, 2011 at 8:51 PM, DongHyun Choi
<cdh4696 AT gmail.com>
wrote:
> Thanks Ratinov!
>
> Now I can understand how they are created, and reimplement the module.
>
> By the way, just for your information, I have detected changes of
> performance with the system configuration: baseline + Gazetteer Match, when
> the toolkits were running on the different machines. The F1-score changes in
> the range of +- 0.05 % of the result reported in the paper. It seems that
> this phenomenon is related to the reading order of the gazetteer files -
> when I modified the program so that it reads the gazetteers in alphabetical
> order of their names, the performance was 87.212 %, 0.01 % lower than the
> result reported in the paper. In some alternative random order, the F1-score
> goes up to 87.27 %, which is 0.05 % higher.
>
> Maybe it will be better if the system reads in the gazetteers always in the
> same order?
>
> Thank you.
>
> Sincerely,
> DongHyun Choi
>
>
> 2011/8/2 Lev-Arie Ratinov
> <arie.ratinov AT gmail.com>
>>
>> Hi Choi.
>>
>> Indeed, I've used partial string matching.
>> For example, if the article has a category containing the word
>> "mountains", it would be added to the locations gazetteer. Same for
>> "births"
>>
>> I hope it helps.
>>
>>
>> On 7/31/11, DongHyun Choi
>> <cdh4696 AT gmail.com>
>> wrote:
>> > Hi,
>> >
>> > First of all, thank you for providing the nice NER toolkit.
>> >
>> > I am trying to use the LBJNER to my research, and trying to upgrade the
>> > Wikipedia gazetteers, since Wikipedia itself is updated during the time.
>> >
>> > The problem was, I was not able to figure out the details of how you get
>> > those Wikipedia gazetteers. The paper describes that category tags are
>> > used
>> > to extract the titles, but how? For example, for the category tag
>> > "people"
>> > only 4 articles have category tag "people"; tags "births" and "deaths"
>> > even
>> > do not exist.
>> >
>> > My question is: did you use partial string matching to the tags? Or is
>> > there
>> > any other thing I don't know, or not yet being published? Could you
>> > please
>> > explain about the method used?
>> >
>> > Thanks in advance.
>> >
>> > Sincerely,
>> > DongHyun Choi
>> >
>>
>>
>> --
>> Peace&Love
>
>

Re: [Illinois-ml-nlp-users] Question on LBJNER: Generating Wiki gazetteers, Lev-Arie Ratinov, 08/01/2011
- Re: [Illinois-ml-nlp-users] Question on LBJNER: Generating Wiki gazetteers, DongHyun Choi, 08/01/2011
  - Re: [Illinois-ml-nlp-users] Question on LBJNER: Generating Wiki gazetteers, Lev-Arie Ratinov, 08/03/2011
    - Re: [Illinois-ml-nlp-users] Question on LBJNER: Generating Wiki gazetteers, DongHyun Choi, 08/03/2011