Re: Performance of scan setTimeRange VS manually doing it

2012-09-12 Thread n keywal
For each file; there is a time range. When you scan/search, the file is skipped if there is no overlap between the file timerange and the timerange of the query. As there are other parameters as well (row distribution, compaction effects, cache, bloom filters, ...) it's difficult to know in

Re: Performance of scan setTimeRange VS manually doing it

2012-09-12 Thread Tom Brown
It seems like the the internal logic for handling a time range is two part: First, as you said, each file contains the minimum and maximum timestamps contained within. This provides a very rough filter for the data, but if your data is right, the effect can be huge. Second, a time range acts a

RE: Performance of scan setTimeRange VS manually doing it

2012-09-12 Thread Anoop Sam John
Subject: Re: Performance of scan setTimeRange VS manually doing it It seems like the the internal logic for handling a time range is two part: First, as you said, each file contains the minimum and maximum timestamps contained within. This provides a very rough filter for the data, but if your data

Re: Performance of scan setTimeRange VS manually doing it

2012-09-12 Thread Xiang Hua
Hi, do you have script in python for rack awareness configuration? Thanks! beatls On Thu, Sep 13, 2012 at 5:52 AM, Tom Brown tombrow...@gmail.com wrote: When I query HBase, I always include a time range. This has not been a problem when querying recent data, but it seems to be an issue