Re: do any of the CrossValidators work at all?

Walrus theCat Mon, 25 Nov 2013 14:11:31 -0800

Hi Jörn, William, and the rest of you OpenNLPers,

This problem is resurfacing.  I found out that my input didn't meet the
input specified in the docs, that it should be 1 sentence per line.  After
properly sentence-breaking my input, a very similar error is cropping up,
viz, that it works with a TokenNameFinderEvaluator but not with a
CrossValidator.  I'm using the FileChannel constructor on the stream.


I've been stepping through the source, but to no avail.  The stack trace is
as follows:

Exception in thread "main" java.lang.NullPointerException
    at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
    at opennlp.maxent.GIS.trainModel(GIS.java:256)
    at opennlp.model.TrainUtil.train(TrainUtil.java:184)
    at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:366)
    at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
    at
opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(TokenNameFinderCrossValidator.java:275)
    at
walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:153)
    at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$3.apply(TrainNERModels.scala:58)
    at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$3.apply(TrainNERModels.scala:53)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:53)
    at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)


On Fri, Nov 22, 2013 at 2:57 AM, Jörn Kottmann <[email protected]> wrote:

> The first exception usually indicates that you don't have enough training
> data, or it contains
> no names. Try to create more training data.
>
> The second exception indicates that the stream you are using can't be
> reset, and therefore doesn't work
> with the cross validator, we should definetley make this more clear.
>
> Jörn
>
>
> On 11/21/2013 06:46 PM, Walrus theCat wrote:
>
>> Jörn,
>>
>> Thanks for your interest.
>>
>> Here's the exception when I use the BufferedReader.  This exception is
>> thrown during training.  It does a couple "log likelihood" statements
>> first, before throwing this:
>>
>> Exception in thread "main" java.lang.IllegalArgumentException: Model not
>> compatible with name finder!
>>      at
>> opennlp.tools.namefind.TokenNameFinderModel.<init>(
>> TokenNameFinderModel.java:81)
>>      at
>> opennlp.tools.namefind.TokenNameFinderModel.<init>(
>> TokenNameFinderModel.java:106)
>>      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:374)
>>      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$.trainModel(TrainNERModels.scala:118)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:53)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:49)
>>      at
>> scala.collection.mutable.ResizableArray$class.foreach(
>> ResizableArray.scala:60)
>>      at scala.collection.mutable.ArrayBuffer.foreach(
>> ArrayBuffer.scala:47)
>>      at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:49)
>>      at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
>>
>> And here it is when I use the ByteArrayInputStream.  This exception is
>> thrown when cross-validating, but not when evaluating the training data
>> stream:
>>
>> Exception in thread "main" java.io.IOException: Stream not marked
>>      at java.io.BufferedReader.reset(BufferedReader.java:505)
>>      at
>> opennlp.tools.util.PlainTextByLineStream.reset(
>> PlainTextByLineStream.java:79)
>>      at
>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>>      at
>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>>      at
>> opennlp.tools.namefind.TokenNameFinderCrossValidator$
>> NameToDocumentSampleStream.reset(TokenNameFinderCrossValidator.java:99)
>>      at
>> opennlp.tools.util.eval.CrossValidationPartitioner.next(
>> CrossValidationPartitioner.java:264)
>>      at
>> opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(
>> TokenNameFinderCrossValidator.java:272)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:129)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:55)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:47)
>>      at
>> scala.collection.mutable.ResizableArray$class.foreach(
>> ResizableArray.scala:60)
>>      at scala.collection.mutable.ArrayBuffer.foreach(
>> ArrayBuffer.scala:47)
>>      at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:47)
>>      at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
>>
>>
>> On Thu, Nov 21, 2013 at 12:25 AM, Jörn Kottmann <[email protected]>
>> wrote:
>>
>>  Please post the exception with stack trace here.
>>>
>>> Jörn
>>>
>>>
>>>
>>> On 11/21/2013 07:53 AM, Walrus theCat wrote:
>>>
>>>  To update, when I create the stream as above
>>>> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
>>>> marked"
>>>> error when attempting to cross validate (but not when just evaluating on
>>>> the training data).  When I, instead, create the PlainTextByLineStream
>>>> on
>>>> a
>>>> BufferedReader (see below), I get the error " Model not compatible with
>>>> name finder!" during training.  The result is I can't cross validate,
>>>> something I really need to do.
>>>>
>>>>
>>>>     def linesToStream(lines:Array[String]) = {
>>>>       val charset = Charset.forName(CHARSET)
>>>>       val reader = new BufferedReader(new InputStreamReader(new
>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
>>>>       new NameSampleDataStream(
>>>>           new PlainTextByLineStream(
>>>>               reader))
>>>>     }
>>>>
>>>>
>>>> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <[email protected]
>>>>
>>>>> wrote:
>>>>>
>>>>   Thanks for the reply, even though I was kind of rude.  I'm using the
>>>> API.
>>>>
>>>>> The evaluator gives me suspiciously high metrics, and the cross
>>>>> validator
>>>>> fails out as mentioned.
>>>>>
>>>>> The code is in Scala:
>>>>>
>>>>>     def linesToStream(lines:Array[String]) = {
>>>>>       val charset = Charset.forName(CHARSET)
>>>>>       new NameSampleDataStream(
>>>>>           new PlainTextByLineStream(
>>>>>               new
>>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)),
>>>>> charset))
>>>>>     }
>>>>>
>>>>> I train the model with the above:
>>>>>         NameFinderME.train("en", entityName, linesToStream(lines),
>>>>> TrainingParameters.defaultParams(),
>>>>>               null:Array[Byte], Collections.emptyMap[String,
>>>>> Object]());
>>>>>
>>>>> When it comes time to evaluate, I recreate the stream to try to
>>>>> circumvent
>>>>> these kinds of problems ("resetting" it also throws the same error):
>>>>>
>>>>>       val crossValidator = new TokenNameFinderCrossValidator("en",
>>>>> entityName, TrainingParameters.defaultParams(),
>>>>>               null:Array[Byte], Collections.emptyMap[String, Object](),
>>>>> listener)
>>>>> crossValidator.evaluate(sampleStream, 10)
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen <
>>>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>>   Are you using the API or the command line tools? Can you send a code
>>>>>
>>>>>> snippet showing how do you load the ObjectStream?
>>>>>>
>>>>>>
>>>>>> 2013/11/20 Walrus theCat <[email protected]>
>>>>>>
>>>>>>   I'm getting  "java.io.IOException: Stream not marked" when calling
>>>>>>
>>>>>>> TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream.
>>>>>>>
>>>>>>>     This
>>>>>>
>>>>>>  works when I use a TokenNameFinderEvaluator instead.  I'm led to
>>>>>>> believe
>>>>>>> that .reset isn't called on the stream in the CrossValidator.
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: do any of the CrossValidators work at all?

Reply via email to