On Fri, Mar 25, 2016 at 12:23 PM, James Johansville <
james.johansvi...@gmail.com> wrote:
> Hello all,
>
> I have 13 RegionServers and presplit into 13 regions (which motivated my
> comment that I aligned my queries with the regionservers, which obviously
> isn't accurate). I have been testing
bq. calculating another new attributes of a trade
Can you put the new attributes in separate columns ?
Cheers
On Fri, Mar 25, 2016 at 12:38 PM, Daniel Połaczański wrote:
> The data is set of trades and the processing is some kind of enrichment
> (calculating another
The data is set of trades and the processing is some kind of enrichment
(calculating another new attributes of a trade). All attributes are needed
(the original and new)
2016-03-25 18:41 GMT+01:00 Ted Yu :
> bq. During the processing the size of the data is doubled.
>
> This
Hello all,
I have 13 RegionServers and presplit into 13 regions (which motivated my
comment that I aligned my queries with the regionservers, which obviously
isn't accurate). I have been testing using a multiple of 13 for partitioned
scans.
Here are my current region setup -- I converted the row
bq. During the processing the size of the data is doubled.
This explains the frequent split :-)
Is the original data needed after post-processing (maybe for auditing) ?
Cheers
On Fri, Mar 25, 2016 at 10:32 AM, Daniel Połaczański wrote:
> I am testing different
I am testing different solutions (POC).
The region size currenlty is 32MB (I know it should be >= 1GB, but we are
testing different solutions with smaller amount of the data ). So
increasing region size is not a solution. Our problems can happen even when
a region will be 1 GB. We want to proces
What's the current region size you use ?
bq. During the processing size of the data gets increased
Can you give us some quantitative measure as to how much increase you
observed (w.r.t. region size) ?
bq. I was looking for some "global lock" in source code
Probably not a good idea using global
On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu wrote:
> James:
> Another experiment you can do is to enable region replica - HBASE-10070.
>
> This would bring down the read variance greatly.
>
>
Suggest you NOT do this James.
Lets figure your issue as-is rather than compound by
Hi,
I have some processing in my coprocesserService which modifies the existing
data in place. It iterates over every row, modifies and puts it back to
region. The table can be modified by only one client.
During the processing size of the data gets increased -> region's size get
increased ->
The read path is much more complex than the write one, so the response time
has much more variance.
The gap is so wide here that I would bet on Ted's or Stack's points, but
here are a few other sources of variance:
- hbase cache: as Anoop said, may be the data is already in the hbase cache
10 matches
Mail list logo