Re: JDBC CQL Driver unable to locate cassandra.yaml

2011-07-15 Thread Brandon Williams
Try another slash in file:/, ie file:// On Thu, Jul 14, 2011 at 10:55 AM, Derek Tracy wrote: > I tried putting the cassandra.yaml in the classpath but got the same error. > Adding -Dcassandra.config=file:/path/to/cassandra.yaml did work. > > > - > Derek Tracy > tra

Re: Commit log is not emptied after "nodetool drain"

2011-07-15 Thread Zhu Han
2011/7/15 Zhu Han > > 2011/7/15 Jonathan Ellis > >> If you have non-empty segments post-drain that is a bug. Is it >> reproducible? >> > > I think it is always reproducible on 0.6.x branch. Here is a simple > experiment: > Should I raise an issue ticket on it? > > 1) "bin/nodetool -h localho

Re: Default behavior of generate index_name for columns...

2011-07-15 Thread Boris Yen
Done. It is CASSANDRA-2903 . On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis wrote: > Please. > > On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen wrote: > > Hi Jonathan, > > Do I need to open a ticket for this? > > Regards > > Boris > > > > On Sa

Re: Default behavior of generate index_name for columns...

2011-07-15 Thread Jonathan Ellis
Please. On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen wrote: > Hi Jonathan, > Do I need to open a ticket for this? > Regards > Boris > > On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis wrote: >> >> Sounds reasonable to me. >> >> On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen wrote: >> > Hi, >> > I hav

Re: Default behavior of generate index_name for columns...

2011-07-15 Thread Boris Yen
Hi Jonathan, Do I need to open a ticket for this? Regards Boris On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis wrote: > Sounds reasonable to me. > > On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen wrote: > > Hi, > > I have a few column families, each has a column called user_name. I tried > to >

Re: Multiple input column families in Cassandra Hadoop mapreduce

2011-07-15 Thread Jeremy Hanna
+1 - We do a lot of this with Pig - joining over several column families. Pig makes it just work. I think Hive does something similar. Unless you really need that much control over your process, I would really use one of those two. On Jul 15, 2011, at 5:28 PM, Jonathan Ellis wrote: > The eas

Re: Default behavior of generate index_name for columns...

2011-07-15 Thread Jonathan Ellis
Sounds reasonable to me. On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen wrote: > Hi, > I have a few column families, each has a column called user_name. I tried to > use secondary index on user_name column for each of the column family. > However, when creating these column families, cassandra keeps

Re: Multiple input column families in Cassandra Hadoop mapreduce

2011-07-15 Thread Jonathan Ellis
The easy answer is "use something like Pig or Hive that does these joins for you under the hood." Not actually sure what the hard answer is. :) On Fri, Jul 15, 2011 at 1:34 AM, Markus Mock wrote: > Hello, > with org.apache.cassandra.hadoop.ConfigHelper.setInputColumnFamily I can set > up the map

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
> I am worried that if only 1 node is active and online, and the other > N-1 nodes are inactive, down, and offline, that the cluster will not > be able to complete the operation, because not all of the data is > available on the 1 node that is up. Which is true, but the correct way normally is to

Re: Replicating to all nodes

2011-07-15 Thread Kyle Gibson
> The node (known as the "coordinating node" because it co-ordinates the > request submitted by the client) will send the request to the nodes > that are in the replica set for the row. The client need not care > about which host it connects to, other than that it be "one of the > ones in the corre

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
> I understand that CL.ONE means the read operation will block until at > least one -replica- responds. If this node is not a replica, what > happens? The node (known as the "coordinating node" because it co-ordinates the request submitted by the client) will send the request to the nodes that are

Re: Replicating to all nodes

2011-07-15 Thread Kyle Gibson
> No. I am not entirely sure from where the confusion comes, so I will > just try to summarize things from scratch in a brief manner. > > Any piece of data you store in Cassandra is going to be in a > particular row, which has a row key. > > That row will have a "replica set" in the Cassandra clust

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
> I was/am under the impression that a node owns a particular token > range, and does not save any data that falls outside of that range > (with exception to any data that might be replicated to it). Based on > what you are saying, each node owns a token range, but also maintains > copies of data o

Re: Replicating to all nodes

2011-07-15 Thread Kyle Gibson
So my understanding of how cassandra saves data is incorrect. I was/am under the impression that a node owns a particular token range, and does not save any data that falls outside of that range (with exception to any data that might be replicated to it). Based on what you are saying, each node ow

Re: Range query ordering with CQL JDBC

2011-07-15 Thread Matthieu Nahoum
Hi Eric, I am using the default partitioner, which is the RandomPartitioner I guess. The key type is String. Are Strings ordered by lexicographic rules? Thanks On Fri, Jul 15, 2011 at 12:04 PM, Eric Evans wrote: > On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote: > > I am trying to ran

Re: Range query ordering with CQL JDBC

2011-07-15 Thread Eric Evans
On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote: > I am trying to range-query a column family on which the keys are > epochs (similar to the output of System.currentTimeMillis() in Java). > In CQL (Cassandra 0.8.1 with JDBC driver): > > SELECT * FROM columnFamily WHERE KEY > '1309205

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Yang
btw just a reminder that even if jna + mlock works fine, with a large portion of physical memory locked by cassandra JVM, you won't get swapping with Cassandra, but you could possibly get swapping happening on other processes in the OS, On Fri, Jul 15, 2011 at 9:47 AM, Chris Burroughs wrote: > On

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Peter Schuller
> Yaa... thats the reason why I'm trying to find out whether Cassandra itself > has some trick to do it (maybe, some sort of configuration/list support for > row-caching - wishful thinking!) > > Any suggestions? You'll have to write code to switch out the row cache implementation. But I recommend

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Chris Burroughs
On 07/15/2011 07:24 AM, Daniel Doubleday wrote: > Also our experience shows that the jna call does not prevent swapping so the > general advice is disable swap. Can you confirm you don't get the (paraphrasing) "whoops we tried mlockall but ulimits denied us" message on startup?

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Suman Ghosh
*>"What's huge? Number of gigs, ballpark."* Data is in the range of 30-40 GB per calendar day per data source if we consider usage sources like SWITCH or IN, and in the range of 5-10 GB for non usage ones like Billing etc. And we use multiple source correlation on the aforesaid data per day. >"

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Peter Schuller
> As we work on telecom data records (voice call/sms/GPRS xDRs), the data > volume is simply HUGE, and we definitely need a “controlled” caching > mechanism in front of the Cassandra layer. What's huge? Number of gigs, ballpark. > By the term  “controlled cache layer”, what I am trying to suggest

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Peter Schuller
> Also our experience shows that the jna call does not prevent swapping so the > general advice is disable swap. That sounds extremely non-likely as it would imply the kernel fails to honor a successful mlockall(), unless other processes on the same machine are being swapped out. Did the process r

Re: Cassandra OOM on repair.

2011-07-15 Thread Andrey Stepachev
Looks like key indexes eat all memory: http://paste.kde.org/97213/ 2011/7/15 Andrey Stepachev > UPDATE: > > I found, that > a) with min10G cassandra survive. > b) I have ~1000 sstables > c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow > > So, I have a question: > a) if

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Mohit Anchlia
Is row cache not enough for this? Sent from my iPad On Jul 15, 2011, at 12:04 AM, Suman Ghosh wrote: > Hi, > > > > We’re presently trying to use Cassandra as a storage/retrieval system for > live data & composite counters (on the data). > > > > As we work on telecom data records (voic

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
> The goal is to configure a cluster in which reads and writes can > complete successfully even if only 1 node is online. For this to work, Why? You should be designing for "only 1 out of N nodes" where N is RF. If you happen to have 3 machines now and you want 3 copies in total that's fine. But w

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Daniel Doubleday
When using jna the mlockall call will result in all pages locked in rss and thus reported there so you have either configured -Xms650M or you are running on a small box and the start script calculated it for you. Also our experience shows that the jna call does not prevent swapping so the gener

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Andrey Stepachev
Looks like mmaped files. 2011/7/15 Donna Li > ** > > All: > > I download JNA jar and put it to cassandra lib directory. When restart > cassandra server, I found the physical memory highly increase. There is no > data saved in cassandra, why so much memory used by cassandra? How can I > decre

Default behavior of generate index_name for columns...

2011-07-15 Thread Boris Yen
Hi, I have a few column families, each has a column called user_name. I tried to use secondary index on user_name column for each of the column family. However, when creating these column families, cassandra keeps reporting "Duplicate index name..." exception. I finally figured out that it seems t

Re: Cassandra OOM on repair.

2011-07-15 Thread Andrey Stepachev
UPDATE: I found, that a) with min10G cassandra survive. b) I have ~1000 sstables c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow So, I have a question: a) if row is bigger then 64mb before compaction, why it compacted in memory b) if it smaller, what eats so much memory?

JNA to avoid swap but physical memory increase

2011-07-15 Thread Donna Li
All: I download JNA jar and put it to cassandra lib directory. When restart cassandra server, I found the physical memory highly increase. There is no data saved in cassandra, why so much memory used by cassandra? How can I decrease the memory usage by cassandra? My version is 0.7.6-2. Befo

Multiple input column families in Cassandra Hadoop mapreduce

2011-07-15 Thread Markus Mock
Hello, with org.apache.cassandra.hadoop.ConfigHelper.setInputColumnFamily I can set up the map phase to read from one column family. Is it possible to have multiple mapper classes each mapping over their own column family so that data from multiple column families can be "joined" in the reduce pha

Cassandra OOM on repair.

2011-07-15 Thread Andrey Stepachev
Hi all. Cassandra constantly OOM on repair or compaction. Increasing memory doesn't help (6G) I can give more, but I think that this is not a regular situation. Cluster has 4 nodes. RF=3. Cassandra version 0.8.1 Ring looks like this: Address DC RackStatus State Load

Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Suman Ghosh
Hi, We’re presently trying to use Cassandra as a storage/retrieval system for live data & composite counters (on the data). As we work on telecom data records (voice call/sms/GPRS xDRs), the data volume is simply HUGE, and we definitely need a “controlled” caching mechanism in front of the Ca