Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Sylvain Lebresne
On Tue, Jan 28, 2014 at 1:05 AM, Edward Capriolo edlinuxg...@gmail.comwrote: If you have only ttl columns, and you never update the column I would not think you need a repair. Right, no deletes and no updates is the case 1. of Michael on which I think we all agree 'periodic repair to avoid

Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Laing, Michael
Thanks again Sylvain! I have actually set up one of our application streams such that the same key is only overwritten with a monotonically increasing ttl. For example, a breaking news item might have an initial ttl of 60 seconds, followed in 45 seconds by an update with a ttl of 3000 seconds,

Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Sylvain Lebresne
I have actually set up one of our application streams such that the same key is only overwritten with a monotonically increasing ttl. For example, a breaking news item might have an initial ttl of 60 seconds, followed in 45 seconds by an update with a ttl of 3000 seconds, followed by an

RE: no more zookeeper?

2014-01-28 Thread S Ahmed
Does C* no long use zookeeper? I don't see a reference to it in the https://github.com/apache/cassandra/blob/trunk/build.xml If not, what replaced it?

Heavy update dataset and compaction

2014-01-28 Thread Robert Wille
I have a dataset which is heavy on updates. The updates are actually performed by inserting new records and deleting the old ones the following day. Some records might be updated (replaced) a thousand times before they are finished. As I watch SSTables get created and compacted on my staging

Re: Heavy update dataset and compaction

2014-01-28 Thread Nate McCall
LeveledCompactionStrategy is ideal for update heavy workloads. If you are using a pre 1.2.8 version make sure you set the sstable_size_in_mb up to the new default of 160. Also, keep an eye on Average live cells per slice and Average tombstones per slice (available in versions 1.2.11 - so I guess

Re: no more zookeeper?

2014-01-28 Thread Nate McCall
AFAIK zookeeper was never in use. It was discussed once or twice over the years, but never seriously. If you are talking about the origins of the current lightweight transactions in 2.0, take a look at this issue (warning - it's one of the longer ASF jira issues I've seen, but some good stuff in

Re: Possible optimization: avoid creating tombstones for TTLed columns if updates to TTLs are disallowed

2014-01-28 Thread horschi
Hi Donald, I was reporting the ticket you mentioned, so I kinds feel like I should answer this :-) I presume the point is that GCable tombstones can still do work (preventing spurious writing from nodes that were down) but only until the data is flushed to disk. I am not sure I understand

Re: Help me on Cassandra Data Modelling

2014-01-28 Thread Naresh Yadav
please inputs on last email if any.. On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav nyadav@gmail.com wrote: yes thunder you are right, i had simplified that by moving *tags *search(partial/exact) in separate column family tagcombination which will act as index for all search based on

Re: no more zookeeper?

2014-01-28 Thread Andrey Ilinykh
Why would cassandra use zookeeper? On Tue, Jan 28, 2014 at 7:18 AM, S Ahmed sahmed1...@gmail.com wrote: Does C* no long use zookeeper? I don't see a reference to it in the https://github.com/apache/cassandra/blob/trunk/build.xml If not, what replaced it?

Re: no more zookeeper?

2014-01-28 Thread Edward Capriolo
Some people had done some custom cassandra zookeper integration back in the day. Triggers, there is some reference in the original facebook thrown over the wall to zk. No official release has ever used zk directly. Though people have suggested it. On Tue, Jan 28, 2014 at 12:08 PM, Andrey Ilinykh

Re: question about secondary index or not

2014-01-28 Thread Edward Capriolo
Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary

question about secondary index or not

2014-01-28 Thread Jimmy Lin
I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through

Re: Heavy update dataset and compaction

2014-01-28 Thread Robert Coli
On Tue, Jan 28, 2014 at 7:57 AM, Robert Wille rwi...@fold3.com wrote: I have a dataset which is heavy on updates. The updates are actually performed by inserting new records and deleting the old ones the following day. Some records might be updated (replaced) a thousand times before they are

Re: A question to OutboundTcpConnection.expireMessages()

2014-01-28 Thread Robert Coli
On Mon, Jan 27, 2014 at 11:40 PM, Lu, Boying boying...@emc.com wrote: When I read the codes of OutboundTcpConnection.expireMessages(), I found the following snippet in a loop: if (qm.timestamp = System.currentTimeMillis() - qm.message.getTimeout()) *return*; My

Re: no more zookeeper?

2014-01-28 Thread S Ahmed
Sorry guys, I am confusing things with Hbase. But Nate's jira look sure looks interesting thanks. On Tue, Jan 28, 2014 at 12:25 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Some people had done some custom cassandra zookeper integration back in the day. Triggers, there is some reference

Re: question about secondary index or not

2014-01-28 Thread Mullen, Robert
I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at

resetting nodetool info exception count

2014-01-28 Thread John Pyeatt
Is there any way of resetting the value of a nodetool info Exceptions value manually? Is there a JMX call I can make? -- John Pyeatt Singlewire Software, LLC www.singlewire.com -- 608.661.1184 john.pye...@singlewire.com

Re: Help me on Cassandra Data Modelling

2014-01-28 Thread Thunder Stumpges
Hey Naresh, Unfortunately I don't have any further advice. I keep feeling like you're looking at a search problem instead of a lookup problem. Perhaps Cassandra is not the right tool for your need in this case. Perhaps something with a full-text index type feature would help. Or perhaps someone

Re: GC eden filled instantly (any size). Dropping messages.

2014-01-28 Thread Arya Goudarzi
Dimetrio, Look at my last post. I showed you how to turn on all useful GC logging flags. From there we can get information on why GC has long pauses. From the changes you have made it seems you are changing things without knowing the effect. Here are a few things to considenr: - Having a 9GB

Re: resetting nodetool info exception count

2014-01-28 Thread Robert Coli
On Tue, Jan 28, 2014 at 2:16 PM, John Pyeatt john.pye...@singlewire.comwrote: Is there any way of resetting the value of a nodetool info Exceptions value manually? Is there a JMX call I can make? Almost certainly not. =Rob

Re: Heavy update dataset and compaction

2014-01-28 Thread Robert Wille
Perhaps a log structured database with immutable data files is not best suited for this use case? Perhaps not, but I have other data structures I¹m moving to Cassandra as well. This is just the first. Cassandra has actually worked quite well for this first step, in spite of it not being an

OpenJDK is not recommended? Why

2014-01-28 Thread Kumar Ranjan
I am in process of setting 2 node cluster with C* version 2.0.4. When I started each node, it failed to communicate thus, each are running separate and not in same ring. So started looking at the log files are saw the message below: WARN [main] 2014-01-28 06:02:17,861 CassandraDaemon.java (line

Re: question about secondary index or not

2014-01-28 Thread Jimmy Lin
in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column

Re: OpenJDK is not recommended? Why

2014-01-28 Thread Colin
Open jdk has known issues and they will raise their ugly little head from time to time-i have experienced them myself. To be safe, I would use the latest oracle 7 release. You may also be experiencing a configuration issue, make sure one node is specified as the seed node and that the other

How to retrieve snappy compressed data from Cassandra using Datastax?

2014-01-28 Thread Check Peck
I am working on a project in which I am supposed to store the snappy compressed data in Cassandra, so that when I retrieve the same data from Cassandra, it should be snappy compressed in memory and then I will decompress that data using snappy to get the actual data from it. I am having a byte

Re: OpenJDK is not recommended? Why

2014-01-28 Thread Michael Shuler
On 01/28/2014 09:55 PM, Kumar Ranjan wrote: I am in process of setting 2 node cluster with C* version 2.0.4. When I started each node, it failed to communicate thus, each are running separate and not in same ring. So started looking at the log files are saw the message below: This is probably

Issues with seeding on EC2 for C* 2.0.4 - help needed

2014-01-28 Thread Kumar Ranjan
Hey Folks - I am burning the midnight oil fast but cant figure out what I am doing wrong? log files has this. I have also listed both seed node and node 2 partial configurations. INFO [main] 2014-01-29 05:15:11,515 CommitLog.java (line 127) Log replay complete, 46 replayed mutations INFO

Re: OpenJDK is not recommended? Why

2014-01-28 Thread Kumar Ranjan
Yes got rid of openJDK and installed oracle version and warning went away. Happy happy...Thank you folks.. On Tue, Jan 28, 2014 at 11:59 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 01/28/2014 09:55 PM, Kumar Ranjan wrote: I am in process of setting 2 node cluster with C* version

Re: Issues with seeding on EC2 for C* 2.0.4 - help needed

2014-01-28 Thread Michael Shuler
Did you open up the ports so they can talk to each other? http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/install/installAMISecurityGroup.html -- Michael

Re: Issues with seeding on EC2 for C* 2.0.4 - help needed

2014-01-28 Thread Kumar Ranjan
Hi Michael - Yes, 7000, 7001, 9042, 9160 are all open on EC2. Issue was seeds address and listen_address were 127.0.0.1 and private_ip. This will help anyone http://stackoverflow.com/questions/20690987/apache-cassandra-unable-to-gossip-with-any-seeds On Wed, Jan 29, 2014 at 1:12 AM, Michael

Re: How to retrieve snappy compressed data from Cassandra using Datastax?

2014-01-28 Thread Alex Popescu
Wouldn't you be better to delegate the compression part to Cassandra (which support Snappy [1])? This way the compression part will be completely transparent to your application. [1] http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression On Tue, Jan 28, 2014 at 8:51 PM, Check