Re: Does DateTieredCompactionStrategy work with a compound clustering key?

2015-03-07 Thread mck
> I believe, that the DateTieredCompactionStrategy would work for PRIMARY > KEY (timeblock, timestamp) -- but does it also work for PRIMARY KEY > (timeblock, timestamp, hash) ? Yes. (sure you don't want to be using a timeuuid instead?) ~mck

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread mck
> Here "partition" is a random digit from 0 to (N*M) > where N=nodes in cluster, and M=arbitrary number. Hopefully it was obvious, but here (unless you've got hot partitions), you don't need N. ~mck

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread mck
y binary serialisation. We've learnt the hard way the value of data transparency, and i'm guessing the storage cost is small given c* compression. Otherwise the advice here is largely repeating what Jens has already said. ~mck ¹ slide 19+20 from https://prezi.com/vt98oob9fvo4/cassandra-summit-cassandra-and-hadoop-at-finnno/

Re: how to scan all rows of cassandra using multiple threads

2015-02-26 Thread mck
> Can I get data owned by a particular node and this way generate sum > on different nodes by iterating over data from virtual nodes and later > generate total sum by doing sum of data from all virtual nodes. > You're pretty much describing a map/reduce job using CqlInputFormat.

Re: Node stuck in joining the ring

2015-02-26 Thread mck
Any errors in your log file? We saw something similar when bootstrap crashed when rebuilding secondary indexes. See CASSANDRA-8798 ~mck

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-23 Thread mck
his is one of the videos where I recall an off-hand mention of the Spark > connector working with vnodes: > https://www.youtube.com/watch?v=1NtnrdIUlg0 Thanks. ~mck

Re: Why no virtual nodes for Cassandra on EC2?

2015-02-21 Thread mck
eos that I watched discussed how the Cassandra Spark > connecter has > optimizations to deal with vnodes. Are these videos public? if so got any link to them? ~mck

Re: How to speed up SELECT * query in Cassandra

2015-02-16 Thread mck
serves requests to web applications that > need low latency. Let it be said this isn't something i'd recommend, just the path we had to take because of our small initial dedicated-HW cluster. (You really want to separate online and offline datacenters, so that you can maximise the offline clusters for the heavy batch reads). ~mck

Re: How to speed up SELECT * query in Cassandra

2015-02-14 Thread mck
always been our 'big data' platform, hadoop/spark is just an extra tool on top. We've never kept data in hdfs and are very grateful for having made that choice. ~mck ref https://prezi.com/vt98oob9fvo4/cassandra-summit-cassandra-and-hadoop-at-finnno/

Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server

2015-01-28 Thread mck
reasonable number of connections. We do this, using code similar to this patch https://github.com/michaelsembwever/cassandra/pull/2/files ~mck ¹ https://issues.apache.org/jira/browse/CASSANDRA-8358

Re: Which Topology fits best ?

2015-01-26 Thread mck
> However I guess it can be easily changed ? that's correct.

Re: Which Topology fits best ?

2015-01-25 Thread mck
NetworkTopogolyStrategy gives you a better horizon and more flexibility as you scale out, at least once you've gone past small cluster problems like wanting RF=3 in a 4 node two dc cluster. IMO I'd go with "DC:1,DC2:1". ~mck

Re: Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread mck
> Are you using Leveled compaction strategy? And if you're using Date Tiered compaction strategy on a table that isn't time-series data, for example deletes happen, you find it compacting over and over. ~mck

Re: Storing large files for later processing through hadoop

2015-01-02 Thread mck
t data directly from Cassandra. See CqlInputFormat. ~mck

Re: Storing large files for later processing through hadoop

2015-01-02 Thread mck
being able to replace HDFS with Cassandra, but i don't think it's alive anymore. ~mck

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck
to be presented in cassandra.yaml?) ~mck

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck
HSHA, particularly for our offline (hadoop/spark) nodes. Sorry i don't have the data anymore to support that statement, although i can say that improvement paled in comparison to cross_node_timeout which we enabled shortly afterwards. ~mck

Can initial_token be decimal or hexadecimal format?

2011-09-13 Thread Mck
: 56713727820156410577229101238628035242 node2: 113427455640312821154458202477256070484 If it is the former there's some important documentation missing. ~mck ps CASSANDRA-1006 seems to be of some relation.

Task's map reading more record than CFIF's inputSplitSize

2011-09-07 Thread Mck
al 1.1 TB 33.33% Token(bytes[76118303760208547436305468318170713656]) ~mck

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mck
ntil you attach it to an issue. (I think a new issue is appropriate here). ~mck

[hadoop] Counters in ColumnFamilyOutputFormat?

2011-05-19 Thread Mck
asses, or to something else? ~mck

Re: IOException: Unable to create hard link ... /snapshots/ ... (errno 17)

2011-05-03 Thread Mck
On Tue, 2011-05-03 at 14:22 -0500, Jonathan Ellis wrote: > Can you create a ticket? CASSANDRA-2598

Re: IOException: Unable to create hard link ... /snapshots/ ... (errno 17)

2011-05-03 Thread Mck
i can also reproduce the problem with hadoop and ColumnFamilyOutputFormat. Turning off snapshot_before_compaction seems to be enough to prevent it. ~mck

Re: IOException: Unable to create hard link ... /snapshots/ ... (errno 17)

2011-05-03 Thread Mck
On Tue, 2011-05-03 at 16:52 +0200, Mck wrote: > Running a 3 node cluster with cassandra-0.8.0-beta1 > > I'm seeing the first node logging many (thousands) times Only "special" thing about this first node is it receives all the writes from our sybase->cassandra import

IOException: Unable to create hard link ... /snapshots/ ... (errno 17)

2011-05-03 Thread Mck
r all column families (including system). It happens a lot during startup. The hardlinks do exist. Stopping, deleting the hardlinks, and starting again does not help. But i haven't seen it once on the other nodes... ~mck ps the stacktrace java.io.IOError: java.io.IOException: Unable to c

Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-26 Thread Mck
On Tue, 2011-04-26 at 12:53 +0100, Stephen Connolly wrote: > (or did you want 20million unneeded deps for the > client jars?) Yes that's a good reason :-) If there anything i can help with? Will beta versions be available under releases repository? ~mck

Re: [RELEASE] Apache Cassandra 0.8.0 beta1

2011-04-26 Thread Mck
On Fri, 2011-04-22 at 16:49 -0500, Eric Evans wrote: > I am pleased to announce the release of Apache Cassandra 0.8.0 beta1. *Truly Awesome!* CQL rocks in so many ways. Is 0.8.0-beta1 available in apache's maven repository? And if not, why not? ~mck

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Mck
ake that over a billion .clone(..) calls... :-( byte[] copies are relatively quick and cheap, still i am seeing a performance degradation in m/r reduce performance with cloning of keys. It's not that you don't have my vote here, i'm just stating my uncertainty on what the correct API should be. ~mck signature.asc Description: This is a digitally signed message part

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Mck
jobs (millions of records) and the performance impact here. The key isn't the only potential live byte[]. You also have names and values in all the columns (and supercolumns) for all the mutations. ~mck

Re: Should nodetool ring give equal load ?

2011-01-12 Thread mck
On Wed, 2011-01-12 at 14:21 -0800, Ryan King wrote: > What consistency level did you use to write the > data? R=1,W=1 (reads happen a long time afterwards). ~mck -- "It is now quite lawful for a Catholic woman to avoid pregnancy by a resort to mathematics, though she is still f

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread mck
On Wed, 2011-01-12 at 23:04 +0100, mck wrote: > > Caused by: TimedOutException() > > What is the exception in the cassandra logs? Or tried increasing rpc_timeout_in_ms? ~mck -- "When there is no enemy within, the enemies outside can't hurt you." African p

Re: Should nodetool ring give equal load ?

2011-01-12 Thread mck
> You're using an ordered partitioner and your nodes are evenly spread > around the ring, but your data probably isn't evenly distributed. This load number seems equals to `du -hs ` and since i've got N == RF shouldn't the data size always be the same on every node? ~

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread mck
On Wed, 2011-01-12 at 18:40 +, Jairam Chandar wrote: > Caused by: TimedOutException() What is the exception in the cassandra logs? ~mck -- "Don't use Outlook. Outlook is really just a security hole with a small e-mail client attached to it." Brian Trosko | www.semb.wever.

Should nodetool ring give equal load ?

2011-01-12 Thread mck
on the first node (regardless of the cf they belong to). "cleanup" didn't help. "compact" only took away 2GB. Otherwise there is a lot here i don't understand. ~mck -- "The turtle only makes progress when it's neck is stuck out" Rollo May | www.s

Re: Hadoop Integration doesn't work when one node is down

2011-01-02 Thread mck
> Is this a bug or feature or a misuse? i can confirm this bug. on a 3 node cluster testing environment with RF 3. (and no issue exists for it AFAIK). ~mck -- "Simplicity is the ultimate sophistication" Leonardo Da Vinci's (William of Ockham) | www.semb.wever.

Re: nodetool can't jmx authenticate...

2010-12-30 Thread mck
On Thu, 2010-12-30 at 08:03 -0600, Jonathan Ellis wrote: > We don't have any explicit code for enabling that, no. https://issues.apache.org/jira/browse/CASSANDRA-1921 the patch was simple (NodeCmd and NodeProbe). just testing it now... ~mck -- "I'm not one of those who t

nodetool can't jmx authenticate...

2010-12-30 Thread mck
word.file" to nodetool doesn't help... Is there any support for nodetool to connect to a password authenticated jmx service? ~mck -- "There are only two ways to live your life. One is as though nothing is a miracle. The other is as if everything is." Albert Einstein |

Re: (newbie) ColumnFamilyOutputFormat only writes one column (per key)

2010-11-24 Thread Mck
issues.apache.org/jira/browse/CASSANDRA-1774 ~mck signature.asc Description: This is a digitally signed message part

(newbie) ColumnFamilyOutputFormat only writes one column (per key)

2010-11-21 Thread mck
add at line 132 > results.add(getMutation(key, sum)); > +results.add(getMutation(new Text("doubled"), sum*2)); Only the last mutation for any key seems to be written. ~mck -- echo '[q]sa[ln0=aln256% Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc | www.sem