Sorry, forgot to say. CUT YOU DATA.
Good luck Gao Sent from my iPad On 2013/10/07, at 22:42, Jeffrey Zemerick <[email protected]> wrote: > Gao, > > I have about a 950 MB file created by Hadoop with sentences in the format > described in the NameFinder training documentation ( > http://opennlp.apache.org/documentation/manual/opennlp.html#tools.namefind.training.tool). > I'm running the jar as described on that page and I set the number of > iterations to 50. (I read somewhere that was a suggested amount.) After the > first failed attempt I increased the memory to 4096 but it failed again > (just took longer to fail). I can increase the memory further but I wanted > to see if there was anything that I was missing. > > Thanks, > Jeff > > > > On Mon, Oct 7, 2013 at 9:29 AM, melo <[email protected]> wrote: > >> Jeff, >> >> Would you please tell us what exactly kind of method are you using? >> >> Are you calling the .jar file? or u r writing a new class to use the >> model. >> >> honestly speaking, I don't think you should get involve with hadoop. >> It is supposed to handle tremendously more data than yours 1Giga. >> By tremendous, I mean TeraByte, maybe PetaByte. >> >> There is always a way. >> Learning Hadoop is not so hard, but why bother? >> >> Gao >> >> On 2013/10/07, at 22:21, Mark G <[email protected]> wrote: >> >>> Also, Map Reduce will allow you to write the annotated sentences to HDFS >> as >>> part files, but at some point those files will have to be merged and the >>> model created from them. In Map Reduce you may find that all your part >>> files end up on the same reducer node and you end up with the same >> problem >>> on a random data node. >>> Seems like this would only work if you could append one MODEL with >> another >>> without recalculation. >>> >>> >>> On Mon, Oct 7, 2013 at 8:23 AM, Jörn Kottmann <[email protected]> >> wrote: >>> >>>> On 10/07/2013 02:05 PM, Jeffrey Zemerick wrote: >>>> >>>>> Thanks. I used MapReduce to build the training input. I didn't realize >>>>> that >>>>> the training can also be performed on Hadoop. Can I simply combine the >>>>> generated models at the completion of the job? >>>> >>>> That will not be an out of the box experience, you need to modify >> OpenNLP >>>> to write the training events >>>> to a file and then use a trainer which can run on Hadoop e.g. Mahout. >> We >>>> now almost have support >>>> to integrate 3rd party ml libraries into OpenNLP. >>>> >>>> Jörn >> >>
