Skip to Content.
Sympa Menu

illinois-ml-nlp-users - Re: [Illinois-ml-nlp-users] How to work use Curator for large files?

illinois-ml-nlp-users AT lists.cs.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

Re: [Illinois-ml-nlp-users] How to work use Curator for large files?


Chronological Thread 
  • From: "Sammons, Mark" <mssammon AT illinois.edu>
  • To: farhaneh farahani <farhane_farahani AT yahoo.com>, "illinois-ml-nlp-users AT cs.uiuc.edu" <illinois-ml-nlp-users AT cs.uiuc.edu>
  • Subject: Re: [Illinois-ml-nlp-users] How to work use Curator for large files?
  • Date: Fri, 7 Mar 2014 15:52:37 +0000
  • Accept-language: en-US
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users/>
  • List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Hi, Farhane.

The curator is designed to handle many small documents.  You will need to split any large file into smaller chunks.  How large these chunks can be depends on which annotations you are trying to generate: SRL and Coreference are slower, so will time out more easily on long texts. 

We have a library for chunking documents packaged as part of "curator-utils", available on our software page (http://cogcomp.cs.illinois.edu/page/software_view/curator-utils). This allows you to set a value in tokens that will be used to break long documents, send them to curator, then reassemble the results into a single data structure.

I am concerned, though, that for such a short file you got a time out.  Please send me:

* a copy of the file you are processing
* a list of the annotations you are requesting from curator
* the log files from the curator and the server components corresponding to the requested annotations

Thanks,

Mark


From: illinois-ml-nlp-users-bounces AT cs.uiuc.edu [illinois-ml-nlp-users-bounces AT cs.uiuc.edu] on behalf of farhaneh farahani [farhane_farahani AT yahoo.com]
Sent: Friday, March 07, 2014 9:31 AM
To: illinois-ml-nlp-users AT cs.uiuc.edu
Subject: [Illinois-ml-nlp-users] How to work use Curator for large files?


Hello All,

I have recently started working with Illinois Curator. I have installed it successfully and when I try testing section of the INSTALL file, I can do everything fine. But, when I try to run the curator on my real dataset which includes a very big file (more than 13MB of texts!), I see an exception.

I know that it is mentioned as a known issue in the INSTALL file, but I still don't understand have I should handle it. I mean, when the connection is timed out, everything is stopped and the result that I expect is not saved in a file.

On the other hand, I tried to make smaller files, but even for a file with 10 sentences I get the same exception. The only possibility to avoid this exception was to have files with fewer than 3 sentences which is so few in my work!
Is there any way that I run the Curator on a single big file?

Best Regards,
Farhane



Archive powered by MHonArc 2.6.16.

Top of Page