The bulk import seemed to be a good option since the bson file generated about 10g data. The problem with my code was that I wasn't releasing memory which eventually became the bottleneck.
Sent from my iPhone > On Oct 11, 2016, at 9:39 PM, Josh Elser <[email protected]> wrote: > > For only 4GB of data, you don't need to do bulk ingest. That is serious > overkill. > > I don't know why the master would have died/become unresponsive. It is > minimally involved with the write-pipeline. > > Can you share your current accumulo-env.sh/accumulo-site.xml? Have you > followed the Accumulo user manual to change the configuration to match the > available resources you have on your 3 nodes where Accumulo is running? > > http://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_new_tables > > http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map > > http://accumulo.apache.org/1.7/accumulo_user_manual.html#_troubleshooting > > Yamini Joshi wrote: >> Hello >> >> I am trying to import data from a bson file to a 3 node Accumulo cluster >> using pyaccumulo. The bson file is 4G and has a lot of records, all to >> be stored into one table. I tried a very naive approach and used >> pyaccumulo batch writer to write to the table. After parsing some >> records, my master became unresonsive and shut down with the tserver >> threads stuck on low memory error. I am assuming that the records are >> created faster than what the proxy/master can handle. Is there ant other >> way to go about it? I am thinking of using bulk ingest but I am not sure >> how exactly. >> >> Best regards, >> Yamini Joshi
