For only 4GB of data, you don't need to do bulk ingest. That is serious overkill.

I don't know why the master would have died/become unresponsive. It is minimally involved with the write-pipeline.

Can you share your current accumulo-env.sh/accumulo-site.xml? Have you followed the Accumulo user manual to change the configuration to match the available resources you have on your 3 nodes where Accumulo is running?

http://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_new_tables

http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map

http://accumulo.apache.org/1.7/accumulo_user_manual.html#_troubleshooting

Yamini Joshi wrote:
Hello

I am trying to import data from a bson file to a 3 node Accumulo cluster
using pyaccumulo. The bson file is 4G and has a lot of records, all to
be stored into one table. I tried a very naive approach and used
pyaccumulo batch writer to write to the table. After parsing some
records, my master became unresonsive and shut down with the tserver
threads stuck on low memory error. I am assuming that the records are
created faster than what the proxy/master can handle. Is there ant other
way to go about it? I am thinking of using bulk ingest but I am not sure
how exactly.

Best regards,
Yamini Joshi

Reply via email to