Re: Direct control over where data is stored?

2011-06-05 Thread Adrian Cockcroft
Sounds like Khanh thinks he can do joins... :-) User oriented data is easy, key by facebook id, let cassandra handle location. Set replication factor=3 so you don't lose data and can do consistent but slower read after write when you need to using quorum. If you are running on AWS you should

Re: problems with many columns on a row

2011-06-05 Thread Mario Micklisch
Thanks for the feedback Aaron! The schema of the CF is default, I just defined the name and the rest is default, have a look: Keyspace: TestKS Read Count: 65 Read Latency: 657.8047076923076 ms. Write Count: 10756 Write Latency: 0.03237039791744143 ms. Pending Tasks: 0 Column Family: CFTest

Re: problems with many columns on a row

2011-06-05 Thread Mario Micklisch
I tracked down the timestamp submission and everything was fine within the PHP Libraries. The thrift php extension however seems to have an overflow, because it was now setting now timestamps with also negative values ( -1242277493 ). I disabled the php extension and as a result I now got correct

Re: When should I use Solandra?

2011-06-05 Thread Jean-Nicolas Boulay Desjardins
Perfect thanks! On Sun, Jun 5, 2011 at 4:43 AM, Victor Kabdebon victor.kabde...@gmail.comwrote: Again I don't really know the specifics of Solandra but in Solr (so Solandra being a cousin of Solr it should be true too) you have XML fields like this : fields name=hashedpassword

CQL/JDBC: Cannot locate cassandra.yaml

2011-06-05 Thread Timo Nentwig
$ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks 2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection - Connected to localhost:9160 2011-06-05 16:21:54,517 ERROR [main]

Re: CQL/JDBC: Cannot locate cassandra.yaml

2011-06-05 Thread Timo Nentwig
On 6/5/11 16:26, Timo Nentwig wrote: $ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks 2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection - Connected to localhost:9160 2011-06-05

Re: Troubleshooting IO performance ?

2011-06-05 Thread Jonathan Ellis
You may be swapping. http://spyced.blogspot.com/2010/01/linux-performance-basics.html explains how to check this as well as how to see what threads are busy in the Java process. On Sat, Jun 4, 2011 at 5:34 PM, Philippe watche...@gmail.com wrote: Hello, I am evaluating using cassandra and I'm

Re: How to delete UUIDs from the CLI?

2011-06-05 Thread Jonathan Ellis
If you're not using 0.8.0 the cli deals poorly with non-string row keys. On Sat, Jun 4, 2011 at 7:48 PM, Kevin thebachel...@gmail.com wrote: Currently I'm using a client (Pelops) to insert UUIDs (both lexical and time) in to Cassandra. I haven't yet implemented a facility to remove them with

Re: CQL/JDBC: Cannot locate cassandra.yaml

2011-06-05 Thread Jonathan Ellis
On Sun, Jun 5, 2011 at 9:38 AM, Timo Nentwig timo.nent...@toptarif.de wrote: Hmm, worked-around that by setting -Dcassandra.config (hmm, the client needs the server's config...?). Yes, this is fixed for 0.8.1. Not very verbose :-\ May have something to do with my l/p being just / for

Re: CQL How to do

2011-06-05 Thread Eric Evans
On Sun, 2011-06-05 at 00:51 -0400, Jeffrey Kesselman wrote: Is CQL really the path for the future for Cassandra? CQL is no more or less official than the Thrift interface, and TTBMK, there is no secret cabal that met to decide it would be The Way. People will use what works best for them, and

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
Hi Maki and Adrian, Thank you very much for the promptness. It's weekend after all :). I realized I forgot a part of my question until Adrian mentioned the replication factor. Is it also possible to set where the replicas are stored as well? Thanks. This is a research experiment we're exploring

RE: How to delete UUIDs from the CLI?

2011-06-05 Thread Kevin
Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't delete any rows from the CLI, regardless of the type they're stored as. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Sunday, June 05, 2011 10:56 AM To: user@cassandra.apache.org

Re: Direct control over where data is stored?

2011-06-05 Thread Eric tamme
On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen nguyen.h.kh...@gmail.com wrote: Hi Maki and Adrian, Thank you very much for the promptness. It's weekend after all :). I realized I forgot a part of my question until Adrian mentioned the replication factor. Is it also possible to set where the

Re: Direct control over where data is stored?

2011-06-05 Thread mcasandra
Please give more detailed info about what exactly you are worried about or trying to solve. Please take a step back and look at cassandra's architecture again and what it's trying to solve. It's a distributed database so if you do what you are describing there is a potential of getting hotspots.

Paging Columns from a Row

2011-06-05 Thread Joseph Stein
What is the best practices here to page and slice columns from a row. So lets say I have 1,000,000 columns in a row I read the row but want to have 1 thread read columns 0 - , second thread (actor in my case) 1 - 1 ... and so on so i can have 100 workers processing 10,000 columns for

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
Great. Thank you, Eric. -k On Sun, Jun 5, 2011 at 2:13 PM, Eric tamme eta...@gmail.com wrote: On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen nguyen.h.kh...@gmail.com wrote: Hi Maki and Adrian, Thank you very much for the promptness. It's weekend after all :). I realized I forgot a part of

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
On Sun, Jun 5, 2011 at 2:17 PM, mcasandra mohitanch...@gmail.com wrote: Please give more detailed info about what exactly you are worried about or trying to solve. In general, we are trying to devise a partitioning and replication scheme that takes into account social relations between data.

Re: How to delete UUIDs from the CLI?

2011-06-05 Thread Jonathan Ellis
You're going to need to get a lot more specific. On Sun, Jun 5, 2011 at 12:12 PM, Kevin thebachel...@gmail.com wrote: Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't delete any rows from the CLI, regardless of the type they're stored as. -Original Message-

Re: Paging Columns from a Row

2011-06-05 Thread Jonathan Ellis
If you need to parallelize (and scale) you need to distribute across multiple rows. One Big Row means all your 100 workers are hammering the same 3 (for instance) replicas at the same time. On Sun, Jun 5, 2011 at 1:43 PM, Joseph Stein crypt...@gmail.com wrote: What is the best practices here to

Re: Paging Columns from a Row

2011-06-05 Thread Joseph Stein
So I can have one PagedIndex CF that holdes a row for each data file I am processing. The columns for that row (in my example) would have X columns and I can make those columns values be 100 strings that represent keys in another PagedData CF This other PagedData CF for each row would have

Re: how to know there are some columns in a row

2011-06-05 Thread Patrick de Torcy
It would be definetely useful to be able to have columns (or super columns) names WITHOUT their values. If these ones are pretty big or if there are a lot of columns, that would generate traffic not necessarily needed (if in the end you are just interrested by some column). Moreover it doesn't

slow insertion rate with secondary index

2011-06-05 Thread Donal Zang
I did a insertion test with and without secondary indexes, and found that: Without secondary index: ~10864 rows inserted per second With secondary index on one column(BytesType): ~1515 rows inserted per second Is this normal? why secondary index would have so much affect? I noticed that If I

Re: problems with many columns on a row

2011-06-05 Thread aaron morton
Ops, I misread 150 GB in one of your earlier emails as 150 MB so forget what I said before. You have loads of free space :) How many files do you have in your data directory ? If it's 1 then that log message was a small bug, that has been fixed. Cheers - Aaron Morton

Re: CQL How to do

2011-06-05 Thread aaron morton
From what I've seen of CQL there is no comparison between the potential complexity of a CQL statement and that of a SQL statement. IMHO CQL is more or less a human readable form of the current API, it does not add features. SQL statements are arbitrarily complex and may generate many possible

Re: Direct control over where data is stored?

2011-06-05 Thread mcasandra
Khanh Nguyen wrote: Is there a way to tell where a piece of data is stored in a cluster? For example, can I tell if LastNameColumn['A'] is stored at node 1 in the ring? I have not used it but you can see getNaturalEndpoints in jmx. It will tell you which nodes are responsible for a given

Re: CQL How to do

2011-06-05 Thread Jeffrey Kesselman
Fair enough. I do have to keep reminding myself that a REST interface requires text. And it does make more sense, at least, when coming from a human as opposed to when you make a computer spend cycles converting binary to text just so another computer can spend cycles turning it back again. On

Re: Direct control over where data is stored?

2011-06-05 Thread Watanabe Maki
It may not what you want, but please read about Network Topology Strategy and DC_QUORUM. http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers You can configure your Cassandra Data Center aware . Your read and write will be resolved in DC local, but will be

Re: Direct control over where data is stored?

2011-06-05 Thread Maki Watanabe
getNaturalEndpoints tells you which key will be stored on which nodes, but we can't force cassandra to store given key to specific nodes. maki 2011/6/6 mcasandra mohitanch...@gmail.com: Khanh Nguyen wrote: Is there a way to tell where a piece of data is stored in a cluster? For example, can

Re: slow insertion rate with secondary index

2011-06-05 Thread Jonathan Ellis
Index updates require read-before-write (to find out what the prior version was, if any, and update the index accordingly). This is random i/o. Index creation on the other hand is a lot of sequential i/o, hence more efficient. So, the classic bulk load advice to ingest data prior to creating

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
On Sun, Jun 5, 2011 at 11:26 PM, Maki Watanabe watanabe.m...@gmail.com wrote: getNaturalEndpoints tells you which key will be stored on which nodes, but we can't force cassandra to store given key to specific nodes. maki I'm confused. Didn't you mention previously that I can use

Re: [RELEASE] 0.8.0

2011-06-05 Thread Terje Marthinussen
0.8 under load may turn out to be more stable and well behaving than any release so far Been doing a few test runs stuffing more than 1 billion records into a 12 node cluster and thing looks better than ever. VM's stable and nice at 11GB. No data corruptions, dead nodes, full GC's or any of the