Re: Training NameFinder with large corpus

Mark G Mon, 07 Oct 2013 06:23:23 -0700

Also, Map Reduce will allow you to write the annotated sentences to HDFS as
part files, but at some point those files will have to be merged and the
model created from them. In Map Reduce you may find that all your part
files end up on the same reducer node and you end up with the same problem
on a random data node.
 Seems like this would only work if you could append one MODEL with another
without recalculation.



On Mon, Oct 7, 2013 at 8:23 AM, Jörn Kottmann <[email protected]> wrote:

> On 10/07/2013 02:05 PM, Jeffrey Zemerick wrote:
>
>> Thanks. I used MapReduce to build the training input. I didn't realize
>> that
>> the training can also be performed on Hadoop. Can I simply combine the
>> generated models at the completion of the job?
>>
>
> That will not be an out of the box experience, you need to modify OpenNLP
> to write the training events
> to a file and then use a trainer which can run on Hadoop e.g. Mahout.  We
> now almost have support
> to integrate 3rd party ml libraries into OpenNLP.
>
> Jörn
>

Re: Training NameFinder with large corpus

Reply via email to