Re: Cassandra and Request routing

2010-05-04 Thread Olivier Mallassi
:) I think this is simpler and I am just stupid I retried with clean data and commit log directories and everything works well. I should have missed something (maybe when I upgraded from 0.5.1 to 0.6) but anyway, I am just in test. On Tue, May 4, 2010 at 8:47 AM, Jonathan Shook wrote: > I

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

2010-05-04 Thread vineet daniel
Only major compactions can clean out obsolete tombstones. On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis wrote: > On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami > wrote: > > Let me rephrase my question. > > > > How does Cassandra deal with bloom filter's false positives on deleted > records? >

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

2010-05-04 Thread vineet daniel
Reduce GCGraceSeconds in storage.conf, that should work. On Tue, May 4, 2010 at 2:31 PM, vineet daniel wrote: > Only major compactions can clean out obsolete tombstones. > > On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis wrote: > >> On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami >> wrote: >> >

Re: Design Query

2010-05-04 Thread vineet daniel
As you havent specified all the details pertaining to filters and your data layout (structure) at a very high level what i can suggest is that you need to create a seperate CF for each filter. On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan wrote: > I am evaluating cassandra to implement activity

Re: Trove maps

2010-05-04 Thread Jeff Hammerbacher
Hey, History repeating itself a bit, here: one delay in getting Cassandra into the open source world was removing its use of the Trove collections library, as the license (LGPL) is not compatible with the Apache 2.0 license. Later, Jeff On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta wrote: >

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Jordan Pittier
I'm facing the same issue with swap. It only occurs when I perform read operations (write are very fast :)). So I can't help you with the memory probleme. But to balance the load evenly between nodes in cluster just manually fix their token.(the "formula" is i * 2^127 / nb_nodes). Jordzn On Tue,

how to fetch latest data

2010-05-04 Thread vineet daniel
Hi In a cluster of cassandra if we are updating any key/value and perform the fetch query on that same key, we get old/stale data. This can be because of Read Repair. Is there any way to fetch the latest updated data from the cluster, as old data stands no significance and showing it to client is

Re: how to fetch latest data

2010-05-04 Thread vineet daniel
If R + W > N, where R, W, and N are respectively the read replica count, the write replica count, and the replication factor, all client reads will see the most recent write. On Tue, May 4, 2010 at 4:39 PM, vineet daniel wrote: > Hi > > In a cluster of cassandra if we are updating any key/value a

Re: Design Query

2010-05-04 Thread Dorin Dragutoiu
2. I have used the same configuration (3 machines with 4GB RAM) and I got an Out of memory error on compactation each time trying to compact 4 x 128MB sstables. Tried different configuration incl Java Opts, same result. When I have used 16GB ram machine everything worked like a charm. Pe 04.05

Re: Best way to store millisecond-accurate data

2010-05-04 Thread Даниел Симеонов
Hi Miguel, I'd like to ask is it possible to have runtime sharding or rows in cassandra, i.e. if the row has too much new columns inserted then create another one row (let's say if the original timesharding is one day per row, then we would have two rows for that day). Maybe batch processes could

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Boris Shulman
I think that the extra (more than 4GB) memory usage comes from the mmaped io, that is why it happens only for reads. On Tue, May 4, 2010 at 2:02 PM, Jordan Pittier wrote: > I'm facing the same issue with swap. It only occurs when I perform read > operations (write are very fast :)). So I can't he

Re: Trove maps

2010-05-04 Thread Boris Shulman
LGPL ia listed as a part of a forbidden licenses for apache projects (see Excluded Licenses in http://www.apache.org/legal/3party.html)... On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher wrote: > Hey, > > History repeating itself a bit, here: one delay in getting Cassandra into > the open sour

Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Mohammad Mamajiwala
Hi, I am very new to Cassandra 0.6.1. I have setup the two node on two different server. I would like to know how data distribution and replication work. Node 1 IP:43.193.211.215Node 2 IP:43.193.213.160 Node 1: Configuraiton        43.193.211.215   Node 2: Configuration      43.193.213.160       4

Re: Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Shinpei Ohtani
> All other parameters are identical in both servers. I have added some data > from both node > but i am confused on which node data stores. Does it stores in both node > OR only stores in one node from where it has been added. I can retrieve data > from both nodes > but sometime can not. Not sur

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook
I may be wrong here. Someone please correct me if I am. There may be a race condition if you aren't increasing your replication factor. If you insert to node A with replication factor 1, and then get from node B with replication factor 1, it should be possible (and even more likely in uneven loadi

Re: Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Mohammad Mamajiwala
Thanks for prompt reply. As per your reply, my configuration should be like, Node 1: Configuraiton     43.193.211.215      43.193.213.160 Node 2: Configuration       43.193.211.215      43.193.213.160    About replication -  In my case it should be 2 as i got two cluster node. Am i right?In C

Re: Best way to store millisecond-accurate data

2010-05-04 Thread Miguel Verde
One would use batch processes (e.g. through Hadoop) or client-side aggregation, yes. In theory it would be possible to introduce runtime sharding across rows into the Cassandra server side, but it's not part of its design. In practice, one would want to model their data such that the 'row h

Re: Error in TBaseHelper compareTo(byte [] a , byte [] b)

2010-05-04 Thread Erik Holstad
Thanks Jonathan! Yeah, I will just wait until we are ready for upgrade and hold of on that project for now. Erik

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Schubert Zhang
1. When initially startup your nodes, please plan your InitialToken of each node evenly. 2. standard On Tue, May 4, 2010 at 9:09 PM, Boris Shulman wrote: > I think that the extra (more than 4GB) memory usage comes from the > mmaped io, that is why it happens only for reads. > > On Tue, May 4, 20

Re: Trove maps

2010-05-04 Thread Tatu Saloranta
Oh boy... that stupid, stupid bickering about true nature of LGPL. Both Apache Foundation and FSF appeared like little kids arguing over whose dad is stronger (this was few years back, when it was discussed whether LGPL components could be used for Apache License projects) Almost made me explicitly

Re: Trove maps

2010-05-04 Thread Joe Stump
On May 4, 2010, at 6:24 PM, Tatu Saloranta wrote: > But of course Apache can impose their own, however misguided silly > rules on projects under their umbrella. :-) I smell an -ac'esque patch to Cassandra brewing. ;) --Joe

Re: Trove maps

2010-05-04 Thread Paul Brown
We went through this with Ode w.r.t. Hibernate. Note that Ode still ships with Hibernate support there, just not with Hibernate libraries in the distribution or with a strong dependence on Hibernate. So, if you made Trove maps optional and provided an adapter, you'd be OK. You just can't bun

Re: Trove maps

2010-05-04 Thread Avinash Lakshman
Hahaha, Jeff - I remember scampering to remove those references to the Trove maps, I think around 2 years ago. Avinash On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher wrote: > Hey, > > History repeating itself a bit, here: one delay in getting Cassandra into > the open source world was removin

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Kyusik Chung
This sounds just like the slowness I was asking about in another thread - after a lot of reads, the machine uses up all available memory on the box and then starts swapping. My understanding was that mmap helps greatly with read and write perf (until the box starts swapping I guess)...is there

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Ran Tavory
I canceled mmap and indeed memory usage is sane again. So far performance hasn't been great, but I'll wait and see. I'm also interested in a way to cap mmap so I can take advantage of it but not swap the host to death... On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung wrote: > This sounds just like

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Vick Khera
On Tue, May 4, 2010 at 2:57 PM, Ran Tavory wrote: > I'm also interested in a way to cap mmap so I can take advantage of it but > not swap the host to death... > Isn't the point of mmap() to just directly access a file as if it were memory? I can see how it would fool the reporting tools into thi

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Nathan McCall
You could try mmap_index_only - this would restrict mmap usage to the index files. -Nate On Tue, May 4, 2010 at 11:57 AM, Ran Tavory wrote: > I canceled mmap and indeed memory usage is sane again. So far performance > hasn't been great, but I'll wait and see. > I'm also interested in a way to ca

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Jonathan Ellis
Are you using 32 bit hosts? If not don't be scared of mmap using a lot of address space, you have plenty. It won't make you swap more than using buffered i/o. On Tue, May 4, 2010 at 1:57 PM, Ran Tavory wrote: > I canceled mmap and indeed memory usage is sane again. So far performance > hasn't b

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Ran Tavory
it's a 64bit host. when I cancel mmap I see less memory used and zero swapping, but it's slowly growing so I'll have to wait and see. Performance isn't much better, not sure what's the bottleneck now (could also be the application). Now on the same host I see: top - 15:43:59 up 12 days, 4:23, 1

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Kyusik Chung
Im using Ubuntu 8.04 on 64 bit hosts on rackspace cloud. Im in the middle of repeating some perf tests, but so far, I get as-good or slightly better read perf by using standard disk access mode vs mmap. So far consecutive tests are returning consistent numbers. Im not sure how to explain it...

Getting all the keys from a ColumnFamily ?

2010-05-04 Thread Chris Dean
I have a ColumnFamily with a small number of keys, but each key has a large number of columns. What's the best way to get just the keys back? I don't want to load all the columns if I don't have to. There also isn't necessarily any column names in common between the different rows. Cheers, Chri

BloomFilter is taking too much memory

2010-05-04 Thread Weijun Li
Hello, We stored about 47mil keys in one Cassandra node and what a memory dump shows for one of the SStableReader: SSTableReader: 386MB. Among this 386MB, IndexSummary takes about 231MB but BloomFilter takes 155MB with an embedded huge array long[19.4mil]. It seems that BloomFilter is taking

Re: BloomFilter is taking too much memory

2010-05-04 Thread Jonathan Ellis
BloomFilter is not redundant, because it stores information about _all_ keys while the index summary stores every 1/128 key. On Tue, May 4, 2010 at 3:47 PM, Weijun Li wrote: > Hello, > > We stored about 47mil keys in one Cassandra node and what a memory dump > shows for one of the SStableReader:

Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Jonathan Ellis
I'll be running a day-long Cassandra training class on Friday, May 21. I'll cover - Installation and configuration - Application design - Basics of Cassandra internals - Operations - Tuning and troubleshooting Details at http://riptanobayarea20100521.eventbrite.com/ -- Jonathan Ellis Project C

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Johan Hilding
On 4 May 2010 23:07, "Jonathan Ellis" wrote: I'll be running a day-long Cassandra training class on Friday, May 21. I'll cover - Installation and configuration - Application design - Basics of Cassandra internals - Operations - Tuning and troubleshooting Details at http://riptanobayarea2010052

Re: strange get_range_slices behaviour v0.6.1

2010-05-04 Thread aaron
Thanks Jonathan. After looking at the Lucandra code I realized my confusions has to do with get_range_slices and the RandomPartitioner. When I switched to the OPP I got the expected behaviour. I was noticing cases under the random partitioner where keys I expected to be returned were not.

Re: BloomFilter is taking too much memory

2010-05-04 Thread Weijun Li
More insight for this sstable: the ArrayList for IndexSummary has 644195 entries, so total number of entries for this sstable is: 644195*128=~82mil. The problem is that the total bits for its BloomFilter (long[19400551] inside BitSet) is 19400551*64=1241635264, which means each key is taking ~15bit

Re: strange get_range_slices behaviour v0.6.1

2010-05-04 Thread Jonathan Ellis
On Tue, May 4, 2010 at 4:17 PM, aaron wrote: > I was noticing cases under the random partitioner where keys I expected to > be returned > were not. Can you give a little advice on the expected behaviour of > get_range_slices > with the RP and I'll try to write a JUnit for it. e.g. Is it essentiall

Building on top of Cassandra's core layer

2010-05-04 Thread David Rosenstrauch
I've had some neat ideas that I'd like to tinker with for a distributed DB that implements a very different data model than Cassandra. However, I obviously don't want to reinvent the wheel - particularly because in the case of distributed systems, the wheel is quite complicated and hard to get

sstable2jason bat script on windows

2010-05-04 Thread Dop Sun
Hi, As of 0.6.1, I don't find sstable2jason.bat. I don't know if I missed anything? It will good if we can have one, which can help import/ export data in/ out development machine. Thanks, Regards, Dop

Re: sstable2jason bat script on windows

2010-05-04 Thread Jonathan Ellis
You didn't miss anything. There aren't many .bat files yet. On Tue, May 4, 2010 at 6:29 PM, Dop Sun wrote: > Hi, > > > > As of 0.6.1, I don’t find sstable2jason.bat. I don’t know if I missed > anything? > > > > It will good if we can have one, which can help import/ export data in/ out > develop

Re: Trove maps

2010-05-04 Thread Prashant Malik
;) ya I it was painful On Tue, May 4, 2010 at 10:53 AM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: > Hahaha, Jeff - I remember scampering to remove those references to the > Trove maps, I think around 2 years ago. > > Avinash > > > On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher wrot

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Mark Greene
Jonathan, Awesome! Any plans to offer this training again in the future for those of us who can't make it this time around? -Mark On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis wrote: > I'll be running a day-long Cassandra training class on Friday, May 21. > I'll cover > > - Installation and

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Jonathan Ellis
Yes, although when and where are TBD. On Tue, May 4, 2010 at 7:38 PM, Mark Greene wrote: > Jonathan, > Awesome! Any plans to offer this training again in the future for those of > us who can't make it this time around? > -Mark > > On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis wrote: >> >> I'll

Cassandra Streaming Service

2010-05-04 Thread Weijun Li
A dumb question: what is the use of Cassandra streaming service? Any use case or example? Thanks, -Weijun

Use binary memtable to load data

2010-05-04 Thread Weijun Li
Does anyone use binary memtable to import data into Cassandra? When you do this how do you determine the destination node that should own those data? Is replication factor taken into consideration when you import binary memtable? Thanks, -Weijun

Re: Cassandra and Request routing

2010-05-04 Thread Robert Coli
On 5/4/10 7:16 AM, Jonathan Shook wrote: I may be wrong here. Someone please correct me if I am. ... The ability to set the replication factor on inserts and gets allows you to decide when (if) and how much (little) to pay the price for consistency. You mean "Consistency Level", not "Replicati

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook
Ah! Thank you. Explained better here: http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency On Tue, May 4, 2010 at 8:38 PM, Robert Coli wrote: > On 5/4/10 7:16 AM, Jonathan Shook wrote: > >> I may be wrong here. Someone please correct me if I am. >> ... >>

Export to another cassandra cluster

2010-05-04 Thread Joost Ouwerkerk
I want to export data from one cassandra cluster (production) to another (development). This is not a case of replication, because I just want a snapshot, not a continuous synchronization. I guess my options include 'nodetool snapshot' and 'sstable2json'. In our case, however, the development cl

Re: Trove maps

2010-05-04 Thread Cagatay Kavukcuoglu
Did removing Trove collections have a noticeable effect on performance or memory use at the time? On Tuesday, May 4, 2010, Avinash Lakshman wrote: > Hahaha, Jeff - I remember scampering to remove those references to the Trove > maps, I think around 2 years ago. > Avinash > > On Tue, May 4, 2010

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Vick Khera
On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis wrote: > Yes, although when and where are TBD. > Having it the day before/after Velocity conference at the end of June would be ideal (hint, hint). I'm sure a lot of people with interest in Cassandra will be in the area.

Updating (as opposed to just setting) Cassandra data via Hadoop

2010-05-04 Thread Mark Schnitzius
I have a situation where I need to accumulate values in Cassandra on an ongoing basis. Atomic increments are still in the works apparently (see https://issues.apache.org/jira/browse/CASSANDRA-721) so for the time being I'll be using Hadoop, and attempting to feed in both the existing values and th

Re: Getting all the keys from a ColumnFamily ?

2010-05-04 Thread Jonathan Ellis
get_range_slices with an empty list of column names should work On Tue, May 4, 2010 at 3:02 PM, Chris Dean wrote: > I have a ColumnFamily with a small number of keys, but each key has a > large number of columns. > > What's the best way to get just the keys back?  I don't want to load all > the c

Re: Building on top of Cassandra's core layer

2010-05-04 Thread Jonathan Ellis
On Tue, May 4, 2010 at 4:55 PM, David Rosenstrauch wrote: > I've had some neat ideas that I'd like to tinker with for a distributed DB > that implements a very different data model than Cassandra.  However, I > obviously don't want to reinvent the wheel - particularly because in the > case of dist

Re: Cassandra Streaming Service

2010-05-04 Thread Jonathan Ellis
The Streaming service is what moves data around for load balancing, bootstrap, and decommission operations. On Tue, May 4, 2010 at 8:08 PM, Weijun Li wrote: > A dumb question: what is the use of Cassandra streaming service? Any use > case or example? > > Thanks, > -Weijun > -- Jonathan Ellis

Re: Use binary memtable to load data

2010-05-04 Thread Jonathan Ellis
On Tue, May 4, 2010 at 8:09 PM, Weijun Li wrote: > Does anyone use binary memtable to import data into Cassandra? Yes. > When you do > this how do you determine the destination node that should own those data? You let the StorageProxy API figure that out. > Is replication factor taken into con

Re: Export to another cassandra cluster

2010-05-04 Thread Jonathan Ellis
I would lightly hack sstable2json to write rows to the other cluster, instead of spitting them out as json. That would be a pretty simple modification. On Tue, May 4, 2010 at 9:21 PM, Joost Ouwerkerk wrote: > I want to export data from one cassandra cluster (production) to > another (development

Appropriate use for Cassandra?

2010-05-04 Thread Denis Haskin
I've been reading everything I can get my hands on about Cassandra and it sounds like a possibly very good framework for our data needs; I'm about to take the plunge and do some prototyping, but I thought I'd see if I can get a reality check here on whether it makes sense. Our schema should be fai

Re: Appropriate use for Cassandra?

2010-05-04 Thread David Strauss
On 2010-05-05 04:50, Denis Haskin wrote: > I've been reading everything I can get my hands on about Cassandra and > it sounds like a possibly very good framework for our data needs; I'm > about to take the plunge and do some prototyping, but I thought I'd > see if I can get a reality check here on