Skip to Content.
Sympa Menu

illinois-ml-nlp-users - [Illinois-ml-nlp-users] LBJ 2.7.0 released!

illinois-ml-nlp-users AT lists.cs.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

[Illinois-ml-nlp-users] LBJ 2.7.0 released!


Chronological Thread 
  • From: Nicholas Rizzolo <rizzolo AT gmail.com>
  • To: illinois-ml-nlp-users <illinois-ml-nlp-users AT cs.uiuc.edu>
  • Subject: [Illinois-ml-nlp-users] LBJ 2.7.0 released!
  • Date: Mon, 1 Nov 2010 17:57:21 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users>
  • List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Hi everyone,

Learning Based Java 2.7.0 is on our website and ready for download:
http://cogcomp.cs.illinois.edu/page/software_view/11

This release fixes several bugs and performance issues, but also (1)
adds new functionality (namely feature pruning), (2) improves the
efficiency of the user's LBJ development cycle, and (3) makes all
training related functionalities more accessible via an improved
run-time API. It isn't backwards compatible, however, so any existing
learned classifiers you may have will need to be retrained to use it.
The Illinois POS Tagger, Chunker, and Coreference Resolution Engine
have already been retrained and are available for download on the
website as well.

New Functionality
----------
LBJ's minor version number has increased primarily because of the
addition of feature pruning to the language. It is enabled by adding
the "prune" clause to your learning classifier expression, e.g.:

learn Labeler
using Features
from new TrainingDataParser()
prune "global" "count" 3
end

In the example above, before training the learned classifier, LBJ will
count all features returned by Features and discard those that appear
fewer than 3 times in the training data. Instead of "global",
"perClass" can also be specified (if the learned classifier is
discrete). In this case, any feature that appears fewer than 3 times
in examples labeled by any given label is discarded from all examples
labeled by that label. Regardless of whether we count features
globally or per class, we can also set the count threshold relative to
the data instead of providing an absolute count, like 3. In the
example above, if we replace ("count" 3) by ("percent" .1), then all
those features whose counts are less than 10% of the most frequently
occurring feature's count are discarded from the training data.
-----

Improved Development Cycle
----------
The LBJ compiler now makes a more concerted effort to re-use
information that has already been computed in previous runs. This
should speed up the development cycle when making small, incremental
changes to a classifier. For example, consider again the learning
classifier expression above. If we run the compiler on it, example
vectors will be extracted and written to disk, features will be pruned
from this data, and finally the classifier will be trained over it for
a single round of training. If we then add the modifier "3 rounds"
onto the "from" clause, there is no need to extract features or prune,
so the compiler will skip straight to training. It will read the
existing model and pre-extracted feature vectors, train for 2 more
rounds, and finish. In general, whatever changes we make anywhere in
the LBJ source file, the compiler will figure out what needs to be
done and how to leverage any existing data to make it faster.
-----

Improved Run-time API
----------
This release also features a massive reorganization of the LBJ
compiler's training related code that will be of interest to those who
like to train their classifiers in their own java programs.
LBJ2.learn.BatchTrainer exposes methods for pre-extraction, pruning,
training, cross validation, and parameter tuning. Stay tuned to the
LBJ Runtime Reference (link below) for more information on how to use
BatchTrainer, though it hasn't been updated yet.

http://cogcomp.cs.illinois.edu/page/LBJ-runtime-reference

Of course, for the adventurous, there's always the javadoc:
http://cogcomp.cs.illinois.edu/software/doc/LBJ2/library/LBJ2/learn/BatchTrainer.html


- Nick



  • [Illinois-ml-nlp-users] LBJ 2.7.0 released!, Nicholas Rizzolo, 11/01/2010

Archive powered by MHonArc 2.6.16.

Top of Page