I would go with second option, HtableInterface.put(List<Put>). The first option sounds dodgy, where 5 minutes is a good time for things to go wrong and you lose your data
On Fri, Feb 13, 2015 at 6:20 AM, hongbin ma <mahong...@apache.org> wrote: > hi, > > I'm trying to use a htable to store data that comes in a streaming fashion. > The streaming in data is guaranteed to have a larger KEY than ANY existing > keys in the table. > And the data will be READONLY. > > The data is streaming in at a very high rate, I don't want to issue a PUT > operation for each data entry, because obviously it is poor in performance. > I'm thinking about pooling the data entries and flush them to hbase every > five minutes, and I AFAIK there're few options: > > 1. Pool the data entries, and every 5 minute run a MR job to convert the > data to hfile format. This approach could avoid the overhead of single PUT, > but I'm afraid the MR job might be too costly( waiting in the job queue) to > keep in pace. > > 2. Use HtableInterface.put(List<Put>) the batched version should be faster, > but I'm not quite sure how much. > > 3.? > > can anyone give me some advice on this? > thanks! > > hongbin >