Hey,
I have a few of VM host (bare metal) machines with varying amounts of free hard
drive space on them. For simplicity let’s say I have three machine like so:
* Machine 1
- Harddrive 1: 150 GB available.
* Machine 2:
- Harddrive 1: 150 GB available.
- Harddrive 2: 150 GB available.
*
Thanks Robert.
On Thu, Aug 28, 2014 at 6:32 PM, Robert Coli rc...@eventbrite.com wrote:
On Thu, Aug 28, 2014 at 3:31 PM, Pavel Kogan pavel.ko...@cortica.com
wrote:
Shouldn't all commitlog files be auto deleted after replaying, for
example after node restart?
Using Cassandra 2.0.8
No,
Thanks, Chris.
75thPercentile is clearly NOT lifetime: its value jumps around.
However, I can tell that Max is lifetime; it's been showing the exact same
value for days, on various nodes. Hence my doubts.
From: Chris Lohfink [mailto:clohf...@blackbirdit.com]
Sent: Thursday, August 28, 2014 3:56
Deleting the json manifest worked like a charm. After 2 days of compactions
I've got 50GB extra space! :)
Just a quick addendum, after deleting the json metadata file, I needed to
restart the node, otherwise it just reloads the file from memory.
Version: 1.2.16
On Wed, Aug 27, 2014 at 8:13 PM,
On Thu, Aug 28, 2014 at 3:39 PM, Donald Smith
donald.sm...@audiencescience.com wrote:
Maybe there’s a way to reset lifetime metrics to zero.
No. [1]
=Rob
[1] At least, they never have before and neither driftx or I believe they
have been created.
Hey Guys,
AFAIK, currently Cassandra partitions (thrift) rows using the row key,
basically uses the hash(row_key) to decide what node that row needs to be
stored on. Now there are times when there is a need to shard a wide row, say
storing events per sensor, so you’d have sensorId-datetime row
With CQL3, you, the developer, get to decide whether to place a primary key
column in the partition key or as a clustering column. So, make sensorID the
partition key and datetime as a clustering column.
-- Jack Krupansky
From: Drew Kutcharian
Sent: Friday, August 29, 2014 6:48 PM
To:
Hi Jack,
I think you missed the point of my email which was trying to avoid the problem
of having very wide rows :) In the notation of sensorId-datatime, the datatime
is a datetime bucket, say a day. The CQL rows would still be keyed by the
actual time of the event. So you’d end up having
On Fri, Aug 29, 2014 at 3:48 PM, Drew Kutcharian d...@venarc.com wrote:
AFAIK, currently Cassandra partitions (thrift) rows using the row key,
basically uses the hash(row_key) to decide what node that row needs to be
stored on. Now there are times when there is a need to shard a wide row,
say
Hi Rob,
I agree that one should not mess around with the default partitioner. But there
might be value in improving the Murmur3 partitioner to be “Composite Aware”.
Since we can have composites in row keys now, why not be able to use only a
part of the row key for partitioning? Makes sense?
I
Okay, but what benefit do you think you get from having the partitions on the
same node – since they would be separate partitions anyway? I mean, what
exactly do you think you’re going to do with them, that wouldn’t be a whole lot
more performant by being able to process data in parallel from
Mainly lower latency and (network overhead) in multi-get requests (WHERE IN
(….)). The coordinator needs to connect only to one node vs potentially all the
nodes in the cluster.
On Aug 29, 2014, at 5:23 PM, Jack Krupansky j...@basetechnology.com wrote:
Okay, but what benefit do you think you
But you already said that your have “very wide rows”, so pulling massive
amounts of data off a single node is very likely to completely dwarf the
connect time. Again, doing the gets in parallel from multiple nodes, with
parallel requests, would be so much more performant. How many nodes are we
One of our nodes is getting an increasing number of pending compactions due, we
think, to
https://issues.apache.org/jira/browse/CASSANDRA-7145 , which is fixed in future
version 2.0.11 . (We had the same error a month ago, but at that time we were
in pre-production and could just clean the
I’m planning to speak at a local meet-up and I need to know if what I have in
my head is even possible.
I want to give an example of working with data in Cassandra. I have data coming
in through Kafka and Storm and I’m saving it off to Cassandra (this is only on
paper at this point). I then
Adaryl,
most ML algorithms are based on some form of numerical optimization, using
something like online gradient descent
http://en.wikipedia.org/wiki/Stochastic_gradient_descent or conjugate
gradient
http://www.math.buffalo.edu/~pitman/courses/cor502/odes/node4.html (e.g
in SVM classifiers). In
16 matches
Mail list logo