no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
node. Depends somewhat on whether there is a mix of more and less
frequently accessed data. But even storing only hot data, never saw
anything less than 20tb hdfs per node.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli <tbarbu...@gmail.com>
wrote:

> Am I the only one thinking 3TB is way too much data for a single node on a
> VM?
>
> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <dan...@sendwithus.com>
> wrote:
>
>> I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>>
>>
>>
>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> Unless there is a bug, snapshots are excluded (they are not HDFS
>>> anyway!) from nodetool status.
>>>
>>> Out of curiousity, is incremenatal repair enabled? This is almost
>>> certainly a rat hole, but there was an issue a few releases back where load
>>> would only increase until the node was restarted. Had been fixed ages ago,
>>> but wondering what happens if you restart a node, IF you have incremental
>>> enabled.
>>>
>>>
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>
>>>
>>> *“All men dream, but not equally. Those who dream by night in the dusty
>>> recesses of their minds wake up in the day to find it was vanity, but the
>>> dreamers of the day are dangerous men, for they may act their dreams with
>>> open eyes, to make it possible.” — T.E. Lawrence*
>>>
>>>
>>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote:
>>>
>>> Can you please check if you have incremental backup enabled and
>>> snapshots are occupying the space.
>>>
>>> run nodetool clearsnapshot command.
>>>
>>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
>>> dan...@sendwithus.com> wrote:
>>>
>>> It's 3-4TB per node, and by load rises, I'm talking about load as
>>> reported by nodetool status.
>>>
>>>
>>>
>>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>>> wrote:
>>>
>>> When you say "the load rises ... ", could you clarify what you mean by
>>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>>> in neither case would that be relevant to transient or persisted disk. Am I
>>> missing something?
>>>
>>>
>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <tbarbu...@gmail.com>
>>> wrote:
>>>
>>> 3-4 TB per node or in total?
>>>
>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <dan...@sendwithus.com
>>> > wrote:
>>>
>>> I should also mention that I am running cassandra 3.10 on the cluster
>>>
>>>
>>>
>>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com>
>>> wrote:
>>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com>
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <dan...@sendwithus.com
>>> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? Any insight on this behaviour would be helpful.
>>>
>>> Cheers,
>>> Daniel
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>

Reply via email to