On Thu, Nov 12, 2015 at 6:44 AM, qihuang.zheng wrote:
> question is : why sstableloader can’t balance data file size?
>
Because it streams ranges from the source SStable to a distributed set of
ranges, especially if you are using vnodes.
It is a general property
split to small size, and can balance to all
nodes, thus our spark job can running quickly.
Tks,qihuang.zheng
原始邮件
发件人:Robert colirc...@eventbrite.com
收件人:user@cassandra.apache.orgu...@cassandra.apache.org
发送时间:2015年11月13日(周五) 04:04
主题:Re: Data.db too large and after sstableloader still large
We do snapshot, and found some Data.db too large:
[qihuang.zheng@spark047219 5]$ find . -type f -size +800M -print0 | xargs -0 ls
-lh
-rw-r--r--. 2 qihuang.zheng users 1.5G 10月 28 14:49
./forseti/velocity/forseti-velocity-jb-103631-Data.db
And sstableloader to new cluster, one node has this
:qihuang.zhengqihuang.zh...@fraudmetrix.cn
收件人:useru...@cassandra.apache.org
发送时间:2015年11月12日(周四) 21:20
主题:Data.db too large and after sstableloader still large
We do snapshot, and found some Data.db too large:
[qihuang.zheng@spark047219 5]$ find . -type f -size +800M -print0 | xargs -0 ls
-lh
-rw-r