Jeff, Would you please tell us what exactly kind of method are you using?
Are you calling the .jar file? or u r writing a new class to use the model. honestly speaking, I don't think you should get involve with hadoop. It is supposed to handle tremendously more data than yours 1Giga. By tremendous, I mean TeraByte, maybe PetaByte. There is always a way. Learning Hadoop is not so hard, but why bother? Gao On 2013/10/07, at 22:21, Mark G <[email protected]> wrote: > Also, Map Reduce will allow you to write the annotated sentences to HDFS as > part files, but at some point those files will have to be merged and the > model created from them. In Map Reduce you may find that all your part > files end up on the same reducer node and you end up with the same problem > on a random data node. > Seems like this would only work if you could append one MODEL with another > without recalculation. > > > On Mon, Oct 7, 2013 at 8:23 AM, Jörn Kottmann <[email protected]> wrote: > >> On 10/07/2013 02:05 PM, Jeffrey Zemerick wrote: >> >>> Thanks. I used MapReduce to build the training input. I didn't realize >>> that >>> the training can also be performed on Hadoop. Can I simply combine the >>> generated models at the completion of the job? >>> >> >> That will not be an out of the box experience, you need to modify OpenNLP >> to write the training events >> to a file and then use a trainer which can run on Hadoop e.g. Mahout. We >> now almost have support >> to integrate 3rd party ml libraries into OpenNLP. >> >> Jörn >>
