Alright. I'll keep that in mind. The next step for me will be to import data from 90G Bson files. I think that'll be a good start for bulk import.
Best regards, Yamini Joshi On Tue, Oct 11, 2016 at 10:14 PM, Josh Elser <[email protected]> wrote: > Even 10G is a rather small amount of data. Setting up a bulk loading > framework is a bit more complicated than it appears at first glance. Take > your pick of course, but I probably wouldn't consider bulk loading unless > you were regularly processing 10-100x that amount of data :) > > > [email protected] wrote: > >> The bulk import seemed to be a good option since the bson file generated >> about 10g data. The problem with my code was that I wasn't releasing memory >> which eventually became the bottleneck. >> >> Sent from my iPhone >> >> On Oct 11, 2016, at 9:39 PM, Josh Elser<[email protected]> wrote: >>> >>> For only 4GB of data, you don't need to do bulk ingest. That is serious >>> overkill. >>> >>> I don't know why the master would have died/become unresponsive. It is >>> minimally involved with the write-pipeline. >>> >>> Can you share your current accumulo-env.sh/accumulo-site.xml? Have you >>> followed the Accumulo user manual to change the configuration to match the >>> available resources you have on your 3 nodes where Accumulo is running? >>> >>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_pr >>> e_splitting_new_tables >>> >>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map >>> >>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_tr >>> oubleshooting >>> >>> Yamini Joshi wrote: >>> >>>> Hello >>>> >>>> I am trying to import data from a bson file to a 3 node Accumulo cluster >>>> using pyaccumulo. The bson file is 4G and has a lot of records, all to >>>> be stored into one table. I tried a very naive approach and used >>>> pyaccumulo batch writer to write to the table. After parsing some >>>> records, my master became unresonsive and shut down with the tserver >>>> threads stuck on low memory error. I am assuming that the records are >>>> created faster than what the proxy/master can handle. Is there ant other >>>> way to go about it? I am thinking of using bulk ingest but I am not sure >>>> how exactly. >>>> >>>> Best regards, >>>> Yamini Joshi >>>> >>>
