Hi Jörn, William, and the rest of you OpenNLPers,
This problem is resurfacing. I found out that my input didn't meet the
input specified in the docs, that it should be 1 sentence per line. After
properly sentence-breaking my input, a very similar error is cropping up,
viz, that it works with a TokenNameFinderEvaluator but not with a
CrossValidator. I'm using the FileChannel constructor on the stream.
I've been stepping through the source, but to no avail. The stack trace is
as follows:
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:184)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:366)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
at
opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(TokenNameFinderCrossValidator.java:275)
at
walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:153)
at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$3.apply(TrainNERModels.scala:58)
at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$3.apply(TrainNERModels.scala:53)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:53)
at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
On Fri, Nov 22, 2013 at 2:57 AM, Jörn Kottmann <[email protected]> wrote:
> The first exception usually indicates that you don't have enough training
> data, or it contains
> no names. Try to create more training data.
>
> The second exception indicates that the stream you are using can't be
> reset, and therefore doesn't work
> with the cross validator, we should definetley make this more clear.
>
> Jörn
>
>
> On 11/21/2013 06:46 PM, Walrus theCat wrote:
>
>> Jörn,
>>
>> Thanks for your interest.
>>
>> Here's the exception when I use the BufferedReader. This exception is
>> thrown during training. It does a couple "log likelihood" statements
>> first, before throwing this:
>>
>> Exception in thread "main" java.lang.IllegalArgumentException: Model not
>> compatible with name finder!
>> at
>> opennlp.tools.namefind.TokenNameFinderModel.<init>(
>> TokenNameFinderModel.java:81)
>> at
>> opennlp.tools.namefind.TokenNameFinderModel.<init>(
>> TokenNameFinderModel.java:106)
>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:374)
>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
>> at
>> walrusthecat.ml.ner.TrainNERModels$.trainModel(TrainNERModels.scala:118)
>> at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:53)
>> at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:49)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(
>> ResizableArray.scala:60)
>> at scala.collection.mutable.ArrayBuffer.foreach(
>> ArrayBuffer.scala:47)
>> at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:49)
>> at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
>>
>> And here it is when I use the ByteArrayInputStream. This exception is
>> thrown when cross-validating, but not when evaluating the training data
>> stream:
>>
>> Exception in thread "main" java.io.IOException: Stream not marked
>> at java.io.BufferedReader.reset(BufferedReader.java:505)
>> at
>> opennlp.tools.util.PlainTextByLineStream.reset(
>> PlainTextByLineStream.java:79)
>> at
>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>> at
>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>> at
>> opennlp.tools.namefind.TokenNameFinderCrossValidator$
>> NameToDocumentSampleStream.reset(TokenNameFinderCrossValidator.java:99)
>> at
>> opennlp.tools.util.eval.CrossValidationPartitioner.next(
>> CrossValidationPartitioner.java:264)
>> at
>> opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(
>> TokenNameFinderCrossValidator.java:272)
>> at
>> walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:129)
>> at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:55)
>> at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 2.apply(TrainNERModels.scala:47)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(
>> ResizableArray.scala:60)
>> at scala.collection.mutable.ArrayBuffer.foreach(
>> ArrayBuffer.scala:47)
>> at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:47)
>> at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
>>
>>
>> On Thu, Nov 21, 2013 at 12:25 AM, Jörn Kottmann <[email protected]>
>> wrote:
>>
>> Please post the exception with stack trace here.
>>>
>>> Jörn
>>>
>>>
>>>
>>> On 11/21/2013 07:53 AM, Walrus theCat wrote:
>>>
>>> To update, when I create the stream as above
>>>> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
>>>> marked"
>>>> error when attempting to cross validate (but not when just evaluating on
>>>> the training data). When I, instead, create the PlainTextByLineStream
>>>> on
>>>> a
>>>> BufferedReader (see below), I get the error " Model not compatible with
>>>> name finder!" during training. The result is I can't cross validate,
>>>> something I really need to do.
>>>>
>>>>
>>>> def linesToStream(lines:Array[String]) = {
>>>> val charset = Charset.forName(CHARSET)
>>>> val reader = new BufferedReader(new InputStreamReader(new
>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
>>>> new NameSampleDataStream(
>>>> new PlainTextByLineStream(
>>>> reader))
>>>> }
>>>>
>>>>
>>>> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <[email protected]
>>>>
>>>>> wrote:
>>>>>
>>>> Thanks for the reply, even though I was kind of rude. I'm using the
>>>> API.
>>>>
>>>>> The evaluator gives me suspiciously high metrics, and the cross
>>>>> validator
>>>>> fails out as mentioned.
>>>>>
>>>>> The code is in Scala:
>>>>>
>>>>> def linesToStream(lines:Array[String]) = {
>>>>> val charset = Charset.forName(CHARSET)
>>>>> new NameSampleDataStream(
>>>>> new PlainTextByLineStream(
>>>>> new
>>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)),
>>>>> charset))
>>>>> }
>>>>>
>>>>> I train the model with the above:
>>>>> NameFinderME.train("en", entityName, linesToStream(lines),
>>>>> TrainingParameters.defaultParams(),
>>>>> null:Array[Byte], Collections.emptyMap[String,
>>>>> Object]());
>>>>>
>>>>> When it comes time to evaluate, I recreate the stream to try to
>>>>> circumvent
>>>>> these kinds of problems ("resetting" it also throws the same error):
>>>>>
>>>>> val crossValidator = new TokenNameFinderCrossValidator("en",
>>>>> entityName, TrainingParameters.defaultParams(),
>>>>> null:Array[Byte], Collections.emptyMap[String, Object](),
>>>>> listener)
>>>>> crossValidator.evaluate(sampleStream, 10)
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen <
>>>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>> Are you using the API or the command line tools? Can you send a code
>>>>>
>>>>>> snippet showing how do you load the ObjectStream?
>>>>>>
>>>>>>
>>>>>> 2013/11/20 Walrus theCat <[email protected]>
>>>>>>
>>>>>> I'm getting "java.io.IOException: Stream not marked" when calling
>>>>>>
>>>>>>> TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream.
>>>>>>>
>>>>>>> This
>>>>>>
>>>>>> works when I use a TokenNameFinderEvaluator instead. I'm led to
>>>>>>> believe
>>>>>>> that .reset isn't called on the stream in the CrossValidator.
>>>>>>>
>>>>>>>
>>>>>>>
>