Re: primary keys query

2012-05-16 Thread Cyril Auburtin
tx was looking at http://code.google.com/p/javageomodel/ too 2012/5/14 aaron morton > So it seems it's not a good idea, to use Cassandra like that? > > Right. It's basically a table scan. > > Here is some background on the approach simple geo took to using > Cassandra... > http://www.readwritewe

Re: C 1.1 & CQL 2.0 or 3.0?

2012-05-16 Thread Cyril Auburtin
well yes, since I can retrieve the keyspace and all CF (not SCF, it's normal in cql?) with cqlsh (v2), tic@mPC:~$ cqlsh 1xx.xx.xxx.xx 9165 Connected to Test Cluster at 1xx.xx.xxx.xx:9165. [cqlsh 2.2.0 | Cassandra 1.1.0 | CQL spec 2.0.0 | Thrift protocol 19.30.0] Use HELP for help. cqlsh> use Mykey

Re: Frequently Updated Wide Rows - Suggestions

2012-05-16 Thread aaron morton
That scenario can result in slower reads than narrow rows that are updated less frequently. Like most things it depends. Do you have a feel for how wide and what the update pattern is like ? Things like levelled compaction (http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassan

Re: Composite Column

2012-05-16 Thread aaron morton
Abhijit, Can you explain the data model a bit more. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/05/2012, at 10:32 PM, samal wrote: > It is just column with JSON value > > On Tue, May 15, 2012 at 4:00 PM, samal wrote: > I h

Re: Couldn't find cfId

2012-05-16 Thread aaron morton
Looks like this https://issues.apache.org/jira/browse/CASSANDRA-3975 Fixed in the latest 1.0.9. Either upgrade (which is always a good idea) or purge the hints from the server. Either using JMX or stopping the node and removing the HintedHandoff files from data/system. In either case you sho

Re: Tuning cassandra (compactions overall)

2012-05-16 Thread aaron morton
> 1 - I got this kind of message quite often (let's say every 30 seconds) : You are running out of memory. Depending on the size of your schema and the work load you will want to start with 4 or 8 GB machines. But most people get the best results with 16Gb. On AWS the common setup is to use m1.x

Re: need some clarification on recommended memory size

2012-05-16 Thread aaron morton
The JVM will not swap out if you have JNA.jar in the path or you have disabled swap on the machine (the simplest thing to do). Cassandra uses memory mapped file access. If you have 16GB of ram, 8 will go to the JVM and the rest can be used by the os to cache files. (Plus the off heap stuff) C

Re: Is nodetool upgradesstables a necessary step for upgrading from 0.8 to 1.0.

2012-05-16 Thread aaron morton
> if I do not upgrade the sstables what is going to happen? Things will break. Things you would normally like to work like repair. The new version nodes can read old data. But when they stream old data files between themselves (during repair) they have to be able to write the bloom filter, ind

Re: Using EC2 ephemeral 4disk raid0 cause high iowait trouble

2012-05-16 Thread aaron morton
On Ubuntu ? Sounds like http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/05/2012, at 2:13 PM, koji Lin wrote: > Hi > > Our service already run cassandra 1.0 on 1x ec2 instances(with ebs)

is it possible to run cassandra process in client mode as smart proxy

2012-05-16 Thread Piavlo
Hi, I'm interested in using some smart proxy cassandra process that could act as coordinator node and be aware of cluster state. And run this smart proxy cassandra process on each client side host where the application(php) with short lived cassandra connections runs. Besides being aware of

Re: is it possible to run cassandra process in client mode as smart proxy

2012-05-16 Thread R. Verlangen
Hi there, I'm using HAProxy for PHP projects to take care of this. It improved connection pooling enormous on the client side: with preserving failover capabilities. Maybe that is something for you to use in combination with PHP. Good luck! 2012/5/16 Piavlo > > Hi, > > I'm interested in using

Re: is it possible to run cassandra process in client mode as smart proxy

2012-05-16 Thread Piavlo
On 05/16/2012 01:24 PM, R. Verlangen wrote: Hi there, I'm using HAProxy for PHP projects to take care of this. It improved connection pooling enormous on the client side: with preserving failover capabilities. Maybe that is something for you to use in combination with PHP. I already use it e

Re: Is nodetool upgradesstables a necessary step for upgrading from 0.8 to 1.0.

2012-05-16 Thread Boris Yen
Hi Aaron, Thanks for the information. :) Boris On Wed, May 16, 2012 at 5:57 PM, aaron morton wrote: > if I do not upgrade the sstables what is going to happen? > > Things will break. > Things you would normally like to work like repair. > > The new version nodes can read old data. But when they

Re: is it possible to run cassandra process in client mode as smart proxy

2012-05-16 Thread R. Verlangen
Yes, I'm aware of those issues however in our use case they don't cause any problems. But ... If there's something better out there I'm really curious: so I'll keep up with this thread. 2012/5/16 Piavlo > On 05/16/2012 01:24 PM, R. Verlangen wrote: > > Hi there, > > I'm using HAProxy for PHP

understanding of native indexes: limitations, potential side effects,...

2012-05-16 Thread David Vanderfeesten
Hi I like to better understand the limitations of native indexes, potential side effects and scenarios where they are required. My understanding so far : - Is that indexes on each node are storing indexes for data locally on the node itself. - Indexes do not return values in a sorted way (hashes

Startup fails after updgrading from 1.0.8 to 1.1.0

2012-05-16 Thread Christoph Eberhardt
Hi there, if updgraded cassandra from 1.0.8 to 1.1.0. It seemed to work in the first place, all seemed to work fine. So I started upgrading the rest of the cluster (at the time only one other node, that is a replica). After having a several errors, I restarted the cluster and now cassandra won'

Re: need some clarification on recommended memory size

2012-05-16 Thread Yiming Sun
Thanks Aaron. The reason I raised the question about memory requirements is because we are seeing some very low performance on cassandra read. We are using cassandra as the backend for an IR repository, and granted the size of each column is very small (OCRed text). Each row represents a book vo

Re: Tuning cassandra (compactions overall)

2012-05-16 Thread Alain RODRIGUEZ
Using c1.medium, we are currently able to deliver the service. What is the the benefit of having more memory ? I mean, I don't understand why having 1, 2, 4, 8 or 16 GB of memory is so different. In my mind, Cassandra will fill the heap and from then, start to flush and compact to avoid OOMing and

Re: Startup fails after updgrading from 1.0.8 to 1.1.0

2012-05-16 Thread Dave Brosius
Might be related to https://issues.apache.org/jira/browse/CASSANDRA-3794 On 05/16/2012 08:12 AM, Christoph Eberhardt wrote: Hi there, if updgraded cassandra from 1.0.8 to 1.1.0. It seemed to work in the first place, all seemed to work fine. So I started upgrading the rest of the cluster (a

Re: Retrieving old data version for a given row

2012-05-16 Thread Felipe Schmidt
That was very helpfull, thank you very much! I still have some questions: -it is possible to make Cassandra keep old value data after flushing? The same question for the memTable, before flushing. Seems to me that when I update some tuple, the old data will be overwrited in memTable, even before f

Re: Retrieving old data version for a given row

2012-05-16 Thread Dave Brosius
You're in for a world of hurt going down that rabbit hole. If you truely want version data then you should think about changing your keying to perhaps be a composite key where key is of form NaturalKey/VersionId Or if you want the versioning at the column level, use composite columns with Col

Re: CQL 3.0 Features

2012-05-16 Thread Roland Mechler
http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 It's my understanding that that the actual reference documentation for 3.0 should be ready soon. Anyone know when? -Roland On Wed, May 16, 2012 at 12:04 AM, Tamil selvan R.S wrote: > Hi, > Is there a tutorial or reference on CQL 3.0 Feature

RE: understanding of native indexes: limitations, potential side effects,...

2012-05-16 Thread Jeremiah Jordan
The limitation is because number of columns could be equal to number of rows. If number of rows is large this can become an issue. -Jeremiah From: David Vanderfeesten [feest...@gmail.com] Sent: Wednesday, May 16, 2012 6:58 AM To: user@cassandra.apache.org Subjec

Re: CQL 3.0 Features

2012-05-16 Thread paul cannon
Sylvain has a draft on https://issues.apache.org/jira/browse/CASSANDRA-3779, and that should be an official cassandra project doc "real soon now". If you're asking about Datastax's reference docs for CQL 3, they will probably be released once Datastax Enterprise or Datastax Community is released w

Re: Cassandra Explorer - GUI for viewing Cassandra Data

2012-05-16 Thread shelan Perera
Hi, Sure. I will update the wiki with details. Thank you very much for the your kind suggestion. Best Regards On Tue, May 15, 2012 at 1:53 AM, aaron morton wrote: > Neat. Would you like to add it to the list here ? > http://wiki.apache.org/cassandra/Administration%20Tools > > Cheers > > > ---

Re: Couldn't find cfId

2012-05-16 Thread Daning Wang
Thanks Aaron! We will upgrade to 1.0.9. Just curious, you said "removing the HintedHandoff files from data/system", what do the HintedHandoff files look like? Thanks, Daning On Wed, May 16, 2012 at 2:32 AM, aaron morton wrote: > Looks like this https://issues.apache.org/jira/browse/CASSANDRA-3

Re: C 1.1 & CQL 2.0 or 3.0?

2012-05-16 Thread paul cannon
Ah, I know why. CQL 3 downcases all your identifiers by default. Wouldn't have been a problem if you had created it with CQL 3, cause then the name would be "mykeyspace" and it would match what you're asking for. But since your keyspace was created with some capital letters in its name, you just n

Re: Startup fails after updgrading from 1.0.8 to 1.1.0

2012-05-16 Thread Dave Brosius
tracking issue here: https://issues.apache.org/jira/browse/CASSANDRA-4251 might be related to: https://issues.apache.org/jira/browse/CASSANDRA-3794 On 05/16/2012 08:12 AM, Christoph Eberhardt wrote: Hi there, if updgraded cassandra from 1.0.8 to 1.1.0. It seemed to work in the first place, all

Re: C 1.1 & CQL 2.0 or 3.0?

2012-05-16 Thread Cyril Auburtin
k I thought keyspace was downcased like CFs and all commands, thx 2012/5/16 paul cannon > Ah, I know why. CQL 3 downcases all your identifiers by default. Wouldn't > have been a problem if you had created it with CQL 3, cause then the name > would be "mykeyspace" and it would match what you're a

Re: understanding of native indexes: limitations, potential side effects,...

2012-05-16 Thread David Vanderfeesten
Txs Jeremiah, But I am not sure I am following " number of columns could be equal to number of rows ". Is native index implemented as one cf shared over all the indexes (one row in the idx CF corresponding to one index) or is there an internal index cf per index?. My (potential wrong) mindset was

Re: understanding of native indexes: limitations, potential side effects,...

2012-05-16 Thread Dave Brosius
Each index you define on the source CF is created using an internal CF that has as its key the value of the column it's indexing, and as its columns, all the keys of all the rows in the source CF that have that value. So if all your rows in your source CF have the same value, then your index cf

Re: understanding of native indexes: limitations, potential side effects,...

2012-05-16 Thread David Vanderfeesten
This corresponds with my thoughts, but I don't see the issue with high cardinality columns. In worst case you get potentially as many rows in the index as in the indexed cf (each having one column). On Wed, May 16, 2012 at 9:03 PM, Dave Brosius wrote: > Each index you define on the source CF is

how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
Hello, I asked the question as a follow-up under a different thread, so I figure I should ask here instead in case the other one gets buried, and besides, I have a little more information. "We find the lack of performance disturbing" as we are only able to get about 3-4MB/sec read performance out

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Mike Peters
Hi Yiming, Cassandra is optimized for write-heavy environments. If you have a read-heavy application, you shouldn't be running your reads through Cassandra. On the bright side - Cassandra read throughput will remain consistent, regardless of your volume. But you are going to have to "wrap"

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
Ah, never thought I would be quoting Luke's "No, that's not true... that's impossible~~" here... sigh. But seriously, thanks Mike. Instead of using memcached, would it help to turn on row cache? An even more philosophical question: what would be a better choice for read-heavy loads? a major p

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Oleg Dulin
Indeed. This is how we are trying to solve this problem. Our application has a built-in cache that resembles a supercolumn or standardcolumn data structure and has API that resembles a combination of Pelops selector and mutator. You can do something like that for Hector. The cache is constra

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
Thanks Oleg. Another caveat from our side is, we have a very large data space (imaging picking 100 items out of 3 million, the chance of having 2 items from the same bin is pretty low). We will experiment with row cache, and hopefully it will help, not the opposite (the tuning guide says row cache

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Oleg Dulin
Please do keep us posted. We have a somewhat similar Cassandra utilization pattern, and I would like to know what your solution is... On 2012-05-16 20:38:37 +, Yiming Sun said: Thanks Oleg.  Another caveat from our side is, we have a very large data space (imaging picking 100 items out of

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
Will do, Oleg. Again, thanks for the information. -- Y. On Wed, May 16, 2012 at 4:44 PM, Oleg Dulin wrote: > ** > > Please do keep us posted. We have a somewhat similar Cassandra utilization > pattern, and I would like to know what your solution is... > > > > On 2012-05-16 20:38:37 +, Yimi

1.0.6 -> 1.1.0 nodetool ownership report, and other anomalies

2012-05-16 Thread Ron Siemens
I upgraded to 1.0.6 to 1.1.0, and I noticed the effective ownership report changed. I have a 3-node cluster, with evenly divided tokens and RF=2. The node tool report on 1.0.6 was: 33.33% 0 33.33% 56713727820156410577229101238628035243 33.33%

Re: Snapshot failing on JSON files in 1.1.0

2012-05-16 Thread Bryan Fernandez
Does anyone know when 1.1.1 will be released? Thanks. On Tue, May 15, 2012 at 5:40 PM, Brandon Williams wrote: > Probably https://issues.apache.org/jira/browse/CASSANDRA-4230 > > On Tue, May 15, 2012 at 4:08 PM, Bryan Fernandez > wrote: > > Greetings, > > > > We recently upgraded from 1.0.8 to

Re: Couldn't find cfId

2012-05-16 Thread Feng Qu
Daning,  You could clear the hintedhandoff via jmx ( HintedHandOffManagerMBean.deleteHintsForEndpoint ) for that host.     Feng Qu > > From: Daning Wang >To: user@cassandra.apache.org >Sent: Wednesday, May 16, 2012 10:38 AM >Subject: Re: Couldn't find cfId > >

Re: Inconsistent dependencies

2012-05-16 Thread Rob Coli
On Tue, Apr 24, 2012 at 12:56 PM, Matthias Pfau wrote: > we just noticed that cassandra is currently published with inconsistent > dependencies. The inconsistencies exist between the published pom and the > published distribution (tar.gz). I compared hashes of the libs of several > versions and th

Re: Adding a second datacenter

2012-05-16 Thread Rob Coli
On Tue, Apr 24, 2012 at 3:24 PM, Bill Au wrote: > Everything went smoothly until I ran the last step, which is to run nodetool > repair on all the nodes in the new data center.  Repair is hanging on all > the new nodes.  I had to hit control-C to break out of it. > [ snip ] > Did I missed anything

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Aaron Turner
On Wed, May 16, 2012 at 12:59 PM, Yiming Sun wrote: > Hello, > > I asked the question as a follow-up under a different thread, so I figure I > should ask here instead in case the other one gets buried, and besides, I > have a little more information. > > "We find the lack of performance disturbing

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
Hi Aaron T., No, actually we haven't, but this sounds like a good suggestion. I can definitely try THIS before jumping into other things such as enabling row cache etc. Thanks! -- Y. On Wed, May 16, 2012 at 9:38 PM, Aaron Turner wrote: > On Wed, May 16, 2012 at 12:59 PM, Yiming Sun wrote: >

Re: cassandra 1.0.9 error - "Read an invalid frame size of 0"

2012-05-16 Thread Gurpreet Singh
Thanks Aaron. will do! On Mon, May 14, 2012 at 1:14 PM, aaron morton wrote: > Are you using framed transport on the client side ? > > Try the Hector user list for hector specific help > https://groups.google.com/forum/?fromgroups#!searchin/hector-users > > Cheers > > - > Aaron Mor

Re: need some clarification on recommended memory size

2012-05-16 Thread aaron morton
> The read rate that I have been seeing is about 3MB/sec, and that is reading > the raw bytes... using string serializer the rate is even lower, about > 2.2MB/sec. Can we break this down a bit: Is this a single client ? How many columns is it asking for ? What sort of query are you sending, s

Re: Composite Column

2012-05-16 Thread Abhijit Chanda
Aaron, Actually Aaron i am looking for a scenario on super columns being replaced by composite column. Say this is a data model using super column rowKey{ superKey1 { Name, Address, City,.

Re: Tuning cassandra (compactions overall)

2012-05-16 Thread aaron morton
> What is the the benefit of having more memory ? I mean, I don't > understand why having 1, 2, 4, 8 or 16 GB of memory is so different. Less frequent and less aggressive garbage collection frees up CPU resources to run the database. Less memory results in frequent and aggressive (i.e. stop the

Re: Composite Column

2012-05-16 Thread samal
It is like using your super column inside columns name. empKey{ employee1+name:XX, employee1+addr:X, employee2+name:X, employee2+addr:X } Here all of your employee details are attached to one domain i.e. all of employee1 details will be *"employee1+[anytihng.n numb

Re: Composite Column

2012-05-16 Thread Abhijit Chanda
Samal, Thanks buddy for interpreting. Now suppose i am inserting data in a column family using this data model dynamically, as a result columnNames will be dynamic. Now consider there is a entry for *employee1* *name*d "Smith", and i want to retrieve that value? Regards, Abhijit On Thu, May 17,