Re: Inconsistent scan performance

2016-03-24 Thread Anoop John
I see you set cacheBlocks to be false on the Scan. By any chance on some other RS(s), the data you are looking for is already in cache? (Any previous scan or by cache on write) And there are no concurrent writes any way right? This much difference in time ! One possibility is blocks avail

Re: Inconsistent scan performance

2016-03-24 Thread Stack
On Thu, Mar 24, 2016 at 4:45 PM, James Johansville < james.johansvi...@gmail.com> wrote: > Hello all, > > So, I wrote a Java application for HBase that does a partitioned full-table > scan according to a set number of partitions. For example, if there are 20 > partitions specified, then 20

Re: Inconsistent scan performance

2016-03-24 Thread Ted Yu
Crossing region boundaries which happen to be on different servers may be. On Thu, Mar 24, 2016 at 5:49 PM, James Johansville < james.johansvi...@gmail.com> wrote: > In theory they should be aligned with *regionserver* boundaries. Would > crossing multiple regions on the same regionserver result

Re: Inconsistent scan performance

2016-03-24 Thread James Johansville
In theory they should be aligned with *regionserver* boundaries. Would crossing multiple regions on the same regionserver result in the big performance difference being seen here? I am using Hortonworks HBase 1.1.2 On Thu, Mar 24, 2016 at 5:32 PM, Ted Yu wrote: > I assume

Re: Inconsistent scan performance

2016-03-24 Thread Ted Yu
I assume the partitions' boundaries don't align with region boundaries, right ? Meaning some partitions would cross region boundaries. Which hbase release do you use ? Thanks On Thu, Mar 24, 2016 at 4:45 PM, James Johansville < james.johansvi...@gmail.com> wrote: > Hello all, > > So, I wrote

Inconsistent scan performance

2016-03-24 Thread James Johansville
Hello all, So, I wrote a Java application for HBase that does a partitioned full-table scan according to a set number of partitions. For example, if there are 20 partitions specified, then 20 separate full scans are launched that cover an equal slice of the row identifier range. The rows are

Re: Unexpected region splits

2016-03-24 Thread Ted Yu
Actually there may be a simpler solution: http://pastebin.com/3KJ7Vxnc We can check the ratio between online regions and total number of regions in IncreasingToUpperBoundRegionSplitPolicy#shouldSplit(). Only when the ratio gets over certain threshold, should splitting start. FYI On Thu, Mar

Re: Unexpected region splits

2016-03-24 Thread Ted Yu
Currently IncreasingToUpperBoundRegionSplitPolicy doesn't detect when the master initialization finishes. There is also some missing piece where region server notifies the completion of cluster initialization (by looking at RegionServerObserver). Cheers On Thu, Mar 24, 2016 at 3:50 AM, Bram

Re: Unexpected region splits

2016-03-24 Thread Bram Desoete
Pedro Gandola writes: > > Hi Ted, > > Thanks, > I think I got the problem, I'm using *IncreasingToUpperBoundRegionSplitPolicy > (default)* instead *ConstantSizeRegionSplitPolicy* which in my use case is > what I want. > > Cheers > Pedro > > On Mon, Feb 15, 2016 at 5:22

Re: Region server getting aborted in every one or two days

2016-03-24 Thread Anoop John
So seems like the issue also comes out just after a log roll. (?) So we no longer have the old WAL file and still that write op try to write to old file? From the WAL file path name u can confirm this -Anoop- On Wed, Mar 23, 2016 at 6:14 PM, Pankaj kr wrote: > Thanks