Re: Training NameFinder with large corpus

melo Mon, 07 Oct 2013 06:31:01 -0700

Jeff,

Would you please tell us what exactly kind of method are you using?


Are you calling the .jar file? or u r writing  a new class to use the model.

honestly speaking, I don't think you should get involve with hadoop. 
It is supposed to handle tremendously more data than yours 1Giga.
By tremendous, I mean TeraByte, maybe PetaByte. 

There is always a way.
Learning Hadoop is not so hard, but why bother?

Gao

On 2013/10/07, at 22:21, Mark G <[email protected]> wrote:

> Also, Map Reduce will allow you to write the annotated sentences to HDFS as
> part files, but at some point those files will have to be merged and the
> model created from them. In Map Reduce you may find that all your part
> files end up on the same reducer node and you end up with the same problem
> on a random data node.
> Seems like this would only work if you could append one MODEL with another
> without recalculation.
> 
> 
> On Mon, Oct 7, 2013 at 8:23 AM, Jörn Kottmann <[email protected]> wrote:
> 
>> On 10/07/2013 02:05 PM, Jeffrey Zemerick wrote:
>> 
>>> Thanks. I used MapReduce to build the training input. I didn't realize
>>> that
>>> the training can also be performed on Hadoop. Can I simply combine the
>>> generated models at the completion of the job?
>>> 
>> 
>> That will not be an out of the box experience, you need to modify OpenNLP
>> to write the training events
>> to a file and then use a trainer which can run on Hadoop e.g. Mahout.  We
>> now almost have support
>> to integrate 3rd party ml libraries into OpenNLP.
>> 
>> Jörn
>>

Re: Training NameFinder with large corpus

Reply via email to