The bulk import seemed to be a good option since the bson file generated about 
10g data. The problem with my code was that I wasn't releasing memory which 
eventually became the bottleneck.

Sent from my iPhone

> On Oct 11, 2016, at 9:39 PM, Josh Elser <[email protected]> wrote:
> 
> For only 4GB of data, you don't need to do bulk ingest. That is serious 
> overkill.
> 
> I don't know why the master would have died/become unresponsive. It is 
> minimally involved with the write-pipeline.
> 
> Can you share your current accumulo-env.sh/accumulo-site.xml? Have you 
> followed the Accumulo user manual to change the configuration to match the 
> available resources you have on your 3 nodes where Accumulo is running?
> 
> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_new_tables
> 
> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map
> 
> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_troubleshooting
> 
> Yamini Joshi wrote:
>> Hello
>> 
>> I am trying to import data from a bson file to a 3 node Accumulo cluster
>> using pyaccumulo. The bson file is 4G and has a lot of records, all to
>> be stored into one table. I tried a very naive approach and used
>> pyaccumulo batch writer to write to the table. After parsing some
>> records, my master became unresonsive and shut down with the tserver
>> threads stuck on low memory error. I am assuming that the records are
>> created faster than what the proxy/master can handle. Is there ant other
>> way to go about it? I am thinking of using bulk ingest but I am not sure
>> how exactly.
>> 
>> Best regards,
>> Yamini Joshi

Reply via email to