illinois-ml-nlp-users AT lists.cs.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

[Illinois-ml-nlp-users] WikificationACL2011Data evaluation data for wikifier

From: Samy Ateia <samyateia AT hotmail.de>
To: <illinois-ml-nlp-users AT cs.uiuc.edu>
Subject: [Illinois-ml-nlp-users] WikificationACL2011Data evaluation data for wikifier
Date: Wed, 7 Dec 2011 15:02:32 +0100
Importance: Normal
List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users>
List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Hi everyone,

I want to evaluate a wikification system against the "WikificationACL2011Data" dataset mentioned in "Local and Global Algorithms for Disambiguation to Wikipedia".
As gold standard I extract the mentions and targets from the problems files in each problems folder.
I consider the text between <SurfaceForm> tags as mentions and between <ChosenAnnotation> tags as targets.

The ACE2004_Coref_Turking dataset contains some problem files that only contain a reference tag to their own filename and no mentions or targets. Are they supposed to be like that or is the information for those files hidden somewhere else?

Basically I want to be sure that if I extract the gold standard like that, my results are comparable to those published in the paper.

Thanks for your help,

Samy Ateia

[Illinois-ml-nlp-users] WikificationACL2011Data evaluation data for wikifier, Samy Ateia, 12/07/2011