Geohash nearby query implementation in Cassandra.

2012-02-17 Thread Raúl Raja Martínez
Hello everyone, I'm working on a application that uses Cassandra and has a geolocation component. I was wondering beside the slides and video at http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php that simplegeo published regarding their strategy if anyone has implemented

Cassandra - row range and column slice

2012-02-17 Thread Maciej Miklas
Hallo, assuming Ordered Partitioner I would like to have possibility to find records by row key range and columns by slice - for example: Give me all rows between 2001 and 2003 and all columns between A and C. For such data: { 2001: {A:v1, Z:v2}, 2002: {R:v2, Z:v3}, 2003: {C:v4, Z:v5},

Re: Cassandra - row range and column slice

2012-02-17 Thread Pierre-Yves Ritschard
In this case, you have one query predicate that operates on a much lower range (years) you could use it as the row key and issue a multigetslicequery where you set all row keys and specify the slice you're interested in (here: 2001 2002 2003, then = A, D) On Fri, Feb 17, 2012 at 11:46 AM,

Streaming sessions from BulkOutputFormat job being listed long after they were killed

2012-02-17 Thread Erik Forsberg
Hi! If I run a hadoop job that uses BulkOutputFormat to write data to Cassandra, and that hadoop job is aborted, i.e. streaming sessions are not completed, it seems like the streaming sessions hang around for a very long time, I've observed at least 12-15h, in output from 'nodetool

General questions about Cassandra

2012-02-17 Thread Alessio Cecchi
Hi, we have developed a software that store logs from mail servers in MySQL, but for huge enviroments we are developing a version that store this data in HBase. Raw logs are, once a day, first normalized, so the output is like this: username,date of login, IP Address, protocol username,date

cassandra on ec2 lock-ups

2012-02-17 Thread Pierre-Yves Ritschard
Hi, I've experienced several node lock-ups on EC2 instances. I'm running with the following set-up: heap-new: 800M max-heap: 8G instance type: m2.xlarge java is java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed

Re: cassandra on ec2 lock-ups

2012-02-17 Thread Pierre-Yves Ritschard
sorry for not doing my homework properly: http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs On Fri, Feb 17, 2012 at 2:00 PM, Pierre-Yves Ritschard p...@spootnik.org wrote: Hi, I've experienced several node lock-ups on EC2 instances. I'm running with the following set-up: heap-new: 800M

Re: cassandra on ec2 lock-ups

2012-02-17 Thread Brandon Williams
http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs On Fri, Feb 17, 2012 at 7:00 AM, Pierre-Yves Ritschard p...@spootnik.org wrote: Hi, I've experienced several node lock-ups on EC2 instances. I'm running with the following set-up: heap-new: 800M max-heap: 8G instance type: m2.xlarge

Re: deleting rows and tombstones

2012-02-17 Thread Jonathan Ellis
Deleting the entire row at once only creates a row-level tombstone. This is almost free. Tombstone buildup happens when performing column-level deletes, then inserting more (different) columns. The classic example is modeling a queue in a row. On Tue, Feb 14, 2012 at 1:54 PM, Todd Burruss

Re: Replication factor per column family

2012-02-17 Thread R. Verlangen
Ok, that's clear, thank you for your time! 2012/2/16 aaron morton aa...@thelastpickle.com yes. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/02/2012, at 10:15 PM, R. Verlangen wrote: Hmm ok. This means if I want to have a CF with RF

RE: General questions about Cassandra

2012-02-17 Thread Don Smith
Are there plans to build-in some sort of map-reduce framework into Cassandra and CQL? It seems that users should be able to apply a Java method to selected rows in parallel on the distributed Cassandra JVMs. I believe Solandra uses such an integration. Don

Re: General questions about Cassandra

2012-02-17 Thread Chris Gerken
Don, That's a good idea, but you have to be careful not to preclude the use of dynamic column families (e.g. CF's with time series-like schemas) which is what Cassandra's best at. The right approach is to build your own ORM/persistence layer (or generate one with some tools) that can hide the

Re: General questions about Cassandra

2012-02-17 Thread Jeremy Hanna
MapReduce and Hadoop generally are pluggable so you can do queries over HDFS, over HBase, or over Cassandra. Cassandra has good Hadoop support as outlined here: http://wiki.apache.org/cassandra/HadoopSupport. If you're looking for a simpler solution, there is DataStax's enterprise product

Re: General questions about Cassandra

2012-02-17 Thread Chris Gerken
In response to an offline question… There are two usage patterns for Cassandra column families, static and dynamic. With both approaches you store objects of a given type into a column family. With static usage the object type you're persisting has a single key and each row in the column

Re: Streaming sessions from BulkOutputFormat job being listed long after they were killed

2012-02-17 Thread Yuki Morishita
Erik, Currently, streaming failure handling is poorly functioning. There are several discussions and bug reports regarding streaming failure on jira. Hanged streaming session will be left in memory unless you restart C*, but it does not cause problem I believe. -- Yuki Morishita On

Re: Key cache hit rate issue

2012-02-17 Thread Todd Burruss
ah, I missed the part about key cache .. I read row cache. thx On 2/16/12 6:14 PM, Jonathan Ellis jbel...@gmail.com wrote: Look for this code in SSTableReader.getPosition: PairDescriptor, DecoratedKey unifiedKey = new PairDescriptor, DecoratedKey(descriptor, decoratedKey);

Re: Geohash nearby query implementation in Cassandra.

2012-02-17 Thread Mike Malone
2012/2/17 Raúl Raja Martínez raulr...@gmail.com Hello everyone, I'm working on a application that uses Cassandra and has a geolocation component. I was wondering beside the slides and video at http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php that simplegeo published

Re: Key cache hit rate issue

2012-02-17 Thread Jonathan Ellis
Only thing I can think of is that if you've set the cache size manually over JMX it will preserve that size if you change it via a schema update. On Fri, Feb 17, 2012 at 12:10 AM, Eran Chinthaka Withana eran.chinth...@gmail.com wrote: Hi Jonathan, For some reason 16637958 (the keys cached)

Re: Key cache hit rate issue

2012-02-17 Thread Eran Chinthaka Withana
I never used JMX for any changes and use JMX only for monitoring. All my updates goes through schema updates. To give you little bit more context (not sure whether this will help but anyway), about 2-3 weeks back the read latency was 4-8ms with about 90-95% key cache hit rate. But after that

Re: Key cache hit rate issue

2012-02-17 Thread Jonathan Ellis
I suspect the main difference is that 2-3 weeks ago almost none of your reads had to hit disk. On Fri, Feb 17, 2012 at 1:53 PM, Eran Chinthaka Withana eran.chinth...@gmail.com wrote: I never used JMX for any changes and use JMX only for monitoring. All my updates goes through schema updates.

Re: Key cache hit rate issue

2012-02-17 Thread Eran Chinthaka Withana
True, the high hit rate has translated to low read latency. But the question is how can I debug the reason for low hit rate now assuming read patterns haven't changed. Thanks, Eran Chinthaka Withana On Fri, Feb 17, 2012 at 3:07 PM, Jonathan Ellis jbel...@gmail.com wrote: I suspect the main