Internally AsyncProcess uses a Map which is keyed by server name: Map<ServerName, MultiAction<Row>> actionsByServer =
new HashMap<ServerName, MultiAction<Row>>(); Here MultiAction would group Put's in your example which are destined for the same server. Cheers On Fri, Jul 17, 2015 at 5:15 AM, Shushant Arora <shushantaror...@gmail.com> wrote: > Thanks ! > > My key is random (hexadecimal). So hot spot should not be created. > > Is there any concept of bulk put. Say I want to raise a one put request > for a 1000 size batch which will hit a region server instead of individual > put for each key. > > > Htable.put(List<Put>) Does this handles batching of put based on > regionserver to which they will land to finally. Say in my batch there are > 10 puts- 5 for RS1,3 for RS3 and 2 for RS3. Does this handles that? > > > > > > > > > > On Thu, Jul 16, 2015 at 8:31 PM, Michael Segel <michael_se...@hotmail.com> > wrote: > >> You ask an interesting question… >> >> Lets set aside spark, and look at the overall ingestion pattern. >> >> Its really an ingestion pattern where your input in to the system is from >> a queue. >> >> Are the events discrete or continuous? (This is kinda important.) >> >> If the events are continuous then more than likely you’re going to be >> ingesting data where the key is somewhat sequential. If you use put(), you >> end up with hot spotting. And you’ll end up with regions half full. >> So you would be better off batching up the data and doing bulk imports. >> >> If the events are discrete, then you’ll want to use put() because the >> odds are you will not be using a sequential key. (You could, but I’d >> suggest that you rethink your primary key) >> >> Depending on the rate of ingestion, you may want to do a manual flush. >> (It depends on the velocity of data to be ingested and your use case ) >> (Remember what caching occurs and where when dealing with HBase.) >> >> A third option… Depending on how you use the data, you may want to avoid >> storing the data in HBase, and only use HBase as an index to where you >> store the data files for quick access. Again it depends on your data >> ingestion flow and how you intend to use the data. >> >> So really this is less a spark issue than an HBase issue when it comes to >> design. >> >> HTH >> >> -Mike >> >> > On Jul 15, 2015, at 11:46 AM, Shushant Arora <shushantaror...@gmail.com> >> wrote: >> > >> > Hi >> > >> > I have a requirement of writing in hbase table from Spark streaming app >> after some processing. >> > Is Hbase put operation the only way of writing to hbase or is there any >> specialised connector or rdd of spark for hbase write. >> > >> > Should Bulk load to hbase from streaming app be avoided if output of >> each batch interval is just few mbs? >> > >> > Thanks >> > >> >> The opinions expressed here are mine, while they may reflect a cognitive >> thought, that is purely accidental. >> Use at your own risk. >> Michael Segel >> michael_segel (AT) hotmail.com >> >> >> >> >> >> >