Hello all,
Thanks a lot for your replies.
@Frank - I will try the compactor you wrote and let you know how it goes.
@Esteban - I am trying to understand how to reduce major compaction load. In my
cluster whenever it happens, it takes down a region server or two. My setting
are - disable time
table will have ~100 regions.
I did n't get the advantage of top rows from same vs different regions ?
They will come from different regions .
On Tue, Apr 5, 2016 at 9:10 AM, Ted Yu wrote:
> How many regions does your table have ?
>
> After sorting, is there a chance that
How many regions does your table have ?
After sorting, is there a chance that the top N rows come from distinct
regions ?
On Mon, Apr 4, 2016 at 8:27 PM, Shushant Arora
wrote:
> Hi
>
> I have a requirement to scan a hbase table based on insertion timestamp.
> I need
Hi
I have a requirement to scan a hbase table based on insertion timestamp.
I need to fetch the keys sorted by insertion timestamp not by key .
I can't made timestamp as prefix of key to avoid hot spotting.
Is there any efficient way possible for this requirement.
Thanks!
Thanks Frank, this is something I am looking for. Would like to have a try with
it.
Thanks,
Ming
-邮件原件-
发件人: Frank Luo [mailto:j...@merkleinc.com]
发送时间: 2016年4月5日 1:38
收件人: user@hbase.apache.org
抄送: Sumit Nigam
主题: RE: Major compaction
I wrote a small program to
Hi!
You can use map on your column family or a prefix of
column qualifier.
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily
--
Kind regards/ Pozdrawiam,
Wojciech Indyk
http://datacentric.pl
2016-04-04 14:13 GMT+02:00
I wrote a small program to do MC in a "smart" way here:
https://github.com/jinyeluo/smarthbasecompactor/
Instead of blindly running MC on a table level, the program find a non-hot
regions that has most store-files on a per region-server base, and run MC on
them. Once done, it finds the next
>> Why I am trying to understand this is because Hbase also sets it to 24
hour default (for time based compaction) and I am looking to lower it to
say >> 20 mins to reduce stress by spreading the load.
The more frequently you run major compaction the more IO (disk/network) you
consume.
Usually,
There is additional information I would like to share with you which points
to region server dying or something or connecting/resolving a wrong region
server.
Here is the log when trying to connect to server:
[main-EventThread] zookeeper.ZooKeeperWatcher:
hconnection-0x1e67b872-0x153e135af570008
Hello Sumit,
Ideally you shouldn't be triggering major compactions that frequently since
minor compactions should be taking care of reducing the number of store
files. The caveat of doing it more frequently is the additional
disk/network I/O.
Can you please elaborate more on "reduce stress by
Hi,
Are there major overheads to running major compaction frequently? As much as I
know, it produces one Hfile for a region and processes delete markers and
version related drops. So, if this process has happened once say. a few mins
back then another major compaction should ideally not cause
> Crazy idea, but you might be able to take stripped down version of region
> normalizer code and make a Tool to run? Requesting split or merge is done
> through the client API, and the only weighing information you need is
> whether region empty or not, that you could find out too?
Yeah, that's
Hi,
I have a hbase table with column name changes (increases) over time.
Is there a way to map such hbase to hive table,
inferring schema from the hbase table?
Thanks
13 matches
Mail list logo