Re: Yet another bulk import question

2011-04-11 Thread Vivek Krishna
Is there a limiting factor/setting that limits/controls the bandwidth on HBase nodes? I know there is a number to be set on zoo.cfg to increase the number of incoming connections. Though I am using a 15 Gigabit ethernet card, I can see only 50-100MB/s of transfer per node (from clients) via gangli

Re: Yet another bulk import question

2011-03-24 Thread Ted Dunning
Something is just wrong. You should be able to do 17,000 records from a few nodes with multiple threads against a fairly small cluster. You should be able to come close to that from a single node into a dozen region servers. On Thu, Mar 24, 2011 at 5:32 PM, Vivek Krishna wrote: > I have a total

Re: Yet another bulk import question

2011-03-24 Thread Vivek Krishna
I have a total of 10 clients-nodes with 3-10 threads running on each node. Record size ~1K Viv On Thu, Mar 24, 2011 at 8:28 PM, Ted Dunning wrote: > Are you putting this data from a single host? Is your sender > multi-threaded? > > I note that (20 GB / 20 minutes < 20 MB / s) so you aren't p

Re: Yet another bulk import question

2011-03-24 Thread Ted Dunning
Are you putting this data from a single host? Is your sender multi-threaded? I note that (20 GB / 20 minutes < 20 MB / s) so you aren't particularly stressing the network. You would likely be stressing a single threaded client pretty severely. What is your record size? It may be that you are b

Yet another bulk import question

2011-03-24 Thread Vivek Krishna
Data Size - 20 GB. It took about an hour with default hbase setting and after varying several parameters, we were able to get this done in ~20 minutes. This is slow and we are trying to improve. We wrote a java client which would essentially `put` to hbase tables in batches. Our fine-tuning par