I would go with second option, HtableInterface.put(List<Put>). The first
option sounds dodgy, where 5 minutes is a good time for things to go wrong
and you lose your data

On Fri, Feb 13, 2015 at 6:20 AM, hongbin ma <mahong...@apache.org> wrote:

> hi,
>
> I'm trying to use a htable to store data that comes in a streaming fashion.
> The streaming in data is guaranteed to have a larger KEY than ANY existing
> keys in the table.
> And the data will be READONLY.
>
> The data is streaming in at a very high rate, I don't want to issue a PUT
> operation for each data entry, because obviously it is poor in performance.
> I'm thinking about pooling the data entries and flush them to hbase every
> five minutes, and I AFAIK there're few options:
>
> 1.  Pool the data entries, and every 5 minute run a MR job to convert the
> data to hfile format. This approach could avoid the overhead of single PUT,
> but I'm afraid the MR job might be too costly( waiting in the job queue) to
> keep in pace.
>
> 2. Use HtableInterface.put(List<Put>) the batched version should be faster,
> but I'm not quite sure how much.
>
> 3.?
>
> can anyone give me some advice on this?
> thanks!
>
> hongbin
>

Reply via email to