Thanks ! My key is random (hexadecimal). So hot spot should not be created.
Is there any concept of bulk put. Say I want to raise a one put request for a 1000 size batch which will hit a region server instead of individual put for each key. Htable.put(List<Put>) Does this handles batching of put based on regionserver to which they will land to finally. Say in my batch there are 10 puts- 5 for RS1,3 for RS3 and 2 for RS3. Does this handles that? On Thu, Jul 16, 2015 at 8:31 PM, Michael Segel <michael_se...@hotmail.com> wrote: > You ask an interesting question… > > Lets set aside spark, and look at the overall ingestion pattern. > > Its really an ingestion pattern where your input in to the system is from > a queue. > > Are the events discrete or continuous? (This is kinda important.) > > If the events are continuous then more than likely you’re going to be > ingesting data where the key is somewhat sequential. If you use put(), you > end up with hot spotting. And you’ll end up with regions half full. > So you would be better off batching up the data and doing bulk imports. > > If the events are discrete, then you’ll want to use put() because the odds > are you will not be using a sequential key. (You could, but I’d suggest > that you rethink your primary key) > > Depending on the rate of ingestion, you may want to do a manual flush. (It > depends on the velocity of data to be ingested and your use case ) > (Remember what caching occurs and where when dealing with HBase.) > > A third option… Depending on how you use the data, you may want to avoid > storing the data in HBase, and only use HBase as an index to where you > store the data files for quick access. Again it depends on your data > ingestion flow and how you intend to use the data. > > So really this is less a spark issue than an HBase issue when it comes to > design. > > HTH > > -Mike > > > On Jul 15, 2015, at 11:46 AM, Shushant Arora <shushantaror...@gmail.com> > wrote: > > > > Hi > > > > I have a requirement of writing in hbase table from Spark streaming app > after some processing. > > Is Hbase put operation the only way of writing to hbase or is there any > specialised connector or rdd of spark for hbase write. > > > > Should Bulk load to hbase from streaming app be avoided if output of > each batch interval is just few mbs? > > > > Thanks > > > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > >