Re: Inconsistent scan performance

2016-03-25 Thread Stack
On Fri, Mar 25, 2016 at 12:23 PM, James Johansville < james.johansvi...@gmail.com> wrote: > Hello all, > > I have 13 RegionServers and presplit into 13 regions (which motivated my > comment that I aligned my queries with the regionservers, which obviously > isn't accurate). I have been testing

Re: processing in coprocessor and region splitting

2016-03-25 Thread Ted Yu
bq. calculating another new attributes of a trade Can you put the new attributes in separate columns ? Cheers On Fri, Mar 25, 2016 at 12:38 PM, Daniel Połaczański wrote: > The data is set of trades and the processing is some kind of enrichment > (calculating another

Re: processing in coprocessor and region splitting

2016-03-25 Thread Daniel Połaczański
The data is set of trades and the processing is some kind of enrichment (calculating another new attributes of a trade). All attributes are needed (the original and new) 2016-03-25 18:41 GMT+01:00 Ted Yu : > bq. During the processing the size of the data is doubled. > > This

Re: Inconsistent scan performance

2016-03-25 Thread James Johansville
Hello all, I have 13 RegionServers and presplit into 13 regions (which motivated my comment that I aligned my queries with the regionservers, which obviously isn't accurate). I have been testing using a multiple of 13 for partitioned scans. Here are my current region setup -- I converted the row

Re: processing in coprocessor and region splitting

2016-03-25 Thread Ted Yu
bq. During the processing the size of the data is doubled. This explains the frequent split :-) Is the original data needed after post-processing (maybe for auditing) ? Cheers On Fri, Mar 25, 2016 at 10:32 AM, Daniel Połaczański wrote: > I am testing different

Re: processing in coprocessor and region splitting

2016-03-25 Thread Daniel Połaczański
I am testing different solutions (POC). The region size currenlty is 32MB (I know it should be >= 1GB, but we are testing different solutions with smaller amount of the data ). So increasing region size is not a solution. Our problems can happen even when a region will be 1 GB. We want to proces

Re: processing in coprocessor and region splitting

2016-03-25 Thread Ted Yu
What's the current region size you use ? bq. During the processing size of the data gets increased Can you give us some quantitative measure as to how much increase you observed (w.r.t. region size) ? bq. I was looking for some "global lock" in source code Probably not a good idea using global

Re: Inconsistent scan performance

2016-03-25 Thread Stack
On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu wrote: > James: > Another experiment you can do is to enable region replica - HBASE-10070. > > This would bring down the read variance greatly. > > Suggest you NOT do this James. Lets figure your issue as-is rather than compound by

processing in coprocessor and region splitting

2016-03-25 Thread Daniel Połaczański
Hi, I have some processing in my coprocesserService which modifies the existing data in place. It iterates over every row, modifies and puts it back to region. The table can be modified by only one client. During the processing size of the data gets increased -> region's size get increased ->

Re: Inconsistent scan performance

2016-03-25 Thread Nicolas Liochon
The read path is much more complex than the write one, so the response time has much more variance. The gap is so wide here that I would bet on Ted's or Stack's points, but here are a few other sources of variance: - hbase cache: as Anoop said, may be the data is already in the hbase cache