RE: Secondary indexes and cardinality

2012-02-13 Thread Tiwari, Dushyant
Perfect, Aaron, Thanks a lot From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, February 14, 2012 12:54 AM To: user@cassandra.apache.org Subject: Re: Secondary indexes and cardinality Heard that indexing a field with high cardinality is not good. http://www.datastax.com/docs/0.7

Got fatal exception after upgrade to 1.0.7 from 1.0.6

2012-02-13 Thread Roshan
Hi I got the below exception to the system.log after upgrade to 1.0.7 from 1.0.6 version. I am using the same configuration files which I used in 1.0.6 version. 2012-02-14 10:48:12,379 ERROR [AbstractCassandraDaemon] Fatal exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerEx

Re: active/pending queue lengths

2012-02-13 Thread Franc Carter
On Tue, Feb 14, 2012 at 6:06 AM, aaron morton wrote: > What CL are you reading at ? > Quorum > > Write ops go to RF number of nodes, read ops go to RF number of nodes 10% > (the default probability that Read Repair will be running) of the time and > CL number of nodes 90% of the time. With 2 no

Querying all keys in a column family

2012-02-13 Thread Martin Arrowsmith
Hi Experts, My program is such that it queries all keys on Cassandra. I want to do this as quick as possible, in order to get as close to real-time as possible. One solution I heard was to use the sstables2json tool, and read the data in as JSON. I understand that reading from each line in Cassan

London meetup - upcoming events

2012-02-13 Thread Dave Gardner
Hi all, Those in the UK might be interested in the next Cassandra London events: Monday 20th February Two talks: "Cassandra as an email storage system" and "CQL - then and now" http://www.meetup.com/Cassandra-London/events/29569461/ Tuesday 6th March How Netflix uses Cassandra with Adrian Coc

Querying for rows without a particular column

2012-02-13 Thread Asankha C. Perera
Hi All I am using expiring columns in my column family, and need to search for the rows where a particular column expired (and no longer exists).. I am using Hector client. How can I make a query to find the rows of my interest? thanks asankha -- Asankha C. Perera AdroitLogic, http://adroitl

Re: SSTable symlinks

2012-02-13 Thread Dan Retzlaff
Too easy. Does anybody have a more difficult approach? :) Just kidding. Thanks, Aaron. On Mon, Feb 13, 2012 at 11:43 AM, aaron morton wrote: > I am nursing an overloaded 0.6 cluster > > Shine on you crazy diamond. > > If you have some additional storage available I would: > > 1) Allocate a data d

Re: problem with sliceQuery with composite column

2012-02-13 Thread aaron morton
If you want to get all the tick between two integers yes. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/02/2012, at 8:36 AM, Dave Brosius wrote: > if the composite column was rearranged as > > ticks:111 > > wouldn't the result be as des

Re: problem with sliceQuery with composite column

2012-02-13 Thread Dave Brosius
if the composite column was rearranged as ticks:111wouldn't the result be as desired? - Original Message -From: "aaron morton" >;aa...@thelastpickle.com

Re: Hector and batch mutation

2012-02-13 Thread aaron morton
> Is the execution of the batch sequential? (in the order data is added). No, parallel see concurrent_writes in cassandra.yaml > Also say there are 10 operations in a batch and 3rd fails will it try the > remaining 7? http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic Cheers ---

Re: How to bring cluster to consistency

2012-02-13 Thread aaron morton
> Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly > tells me that my message looks like spam… Send as text. What version are you using ? It looks like you are using the ByteOrderedPartitioner , is that correct ? I would try to get the repair done first, what was the

Re: Secondary indexes and cardinality

2012-02-13 Thread aaron morton
> Heard that indexing a field with high cardinality is not good. http://www.datastax.com/docs/0.7/data_model/secondary_indexes > Will there be any performance improvement? Is this the way secondary indexes > are maintained? Updating secondary indexes requires a read and a write. > Also this ma

Re: active/pending queue lengths

2012-02-13 Thread aaron morton
What CL are you reading at ? Write ops go to RF number of nodes, read ops go to RF number of nodes 10% (the default probability that Read Repair will be running) of the time and CL number of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2, every request will involve all nodes. A

Re: problem with sliceQuery with composite column

2012-02-13 Thread aaron morton
My understanding is you expected to see 111:ticks 222:ticks 333:ticks 444:ticks But instead you are getting 111:ticks 111:quote 222:ticks 222:quote 333:ticks 333:quote 444:ticks If that is the case things are working as expected. The slice operation gets a column range. So if you start at 1

SSTable symlinks

2012-02-13 Thread Dan Retzlaff
Hi all, I am nursing an overloaded 0.6 cluster through compaction to get its disk usage under 50%. Many rows' content have been replaced so that after compaction there will be plenty of room, but a couple of nodes are currently at 95%. One strategy I considered is temporarily moving a couple of t

Hector and batch mutation

2012-02-13 Thread Tiwari, Dushyant
Hi Guys, A very trivial question on batch mutation provided by Hector. Is the execution of the batch sequential? (in the order data is added). Also say there are 10 operations in a batch and 3rd fails will it try the remaining 7? Is execution of batch mutator multi threaded ? Regards, Dushyant

Re: How to bring cluster to consistency

2012-02-13 Thread Nikolay Kоvshov
Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly tells me that my message looks like spam... > 2/ both of your nodes seem to be using the same token? The output indicates > that 100% of your key range is assigned to 10.111.1.141 (and > therefore 10.111.1.142 holds repl

Re: How to bring cluster to consistency

2012-02-13 Thread Dominic Williams
Hi Nikolay, Some points that may be useful: 1/ auto_bootstrap = true is used for telling a new node to join the ring (the cluster). It has nothing to do with hinted handoff 2/ both of your nodes seem to be using the same token? The output indicates that 100% of your key range is assigned to 10.1

How to bring cluster to consistency

2012-02-13 Thread Nikolay Kоvshov
Hello everybody I have a very simple cluster containing 2 servers. Replication_factor = 2, Consistency_level of reads and writes = 1 10.111.1.141datacenter1 rack1 Up Normal 1.5 TB 100.00% vjpigMzv4KkX3x7z 10.111.1.142datacenter1 rack1 Up Normal 1.41 TB

Secondary indexes and cardinality

2012-02-13 Thread Tiwari, Dushyant
Hi Cassandra Users, Heard that indexing a field with high cardinality is not good. If we create a CF to store the index information like indexed field as key and the keys of original CF as cols in the row. Will there be any performance improvement? Is this the way secondary indexes are maintain

active/pending queue lengths

2012-02-13 Thread Franc Carter
Hi, I've been looking at tpstats as various test queries run and I noticed something I don't understand. I have a two node cluster with RF=2 on which I run 4 parallel queries, each job goes through a list of keys doing a multiget for 2 keys at a time. If two of the queries go to one node and the

[RELEASE] Apache Cassandra 0.8.10 released

2012-02-13 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 0.8.10. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassan

Re: murmurhash partitioner

2012-02-13 Thread Sylvain Lebresne
https://issues.apache.org/jira/browse/CASSANDRA-3772 2012/2/13 Radim Kolar : > Are there plans to write partitioner based on faster hash alg. instead of > MD5? I did cassandra profiling and lot of time is spent inside MD5 function.

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:15 PM, Peter Schuller wrote: > > 2 Node cluster, 7.9GB of ram (ec2 m1.large) > > RF=2 > > 11GB per node > > Quorum reads > > 122 million keys > > heap size is 1867M (default from the AMI I am running) > > I'm reading about 900k keys > > Ok, so basically a very significan

murmurhash partitioner

2012-02-13 Thread Radim Kolar
Are there plans to write partitioner based on faster hash alg. instead of MD5? I did cassandra profiling and lot of time is spent inside MD5 function.

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:09 PM, Peter Schuller wrote: > > the servers spending >50% of the time in io-wait > > Note that I/O wait is not necessarily a good indicator, depending on > situation. In particular if you have multiple drives, I/O wait can > mostly be ignored. Similarly if you have non-

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> 2 Node cluster, 7.9GB of ram (ec2 m1.large) > RF=2 > 11GB per node > Quorum reads > 122 million keys > heap size is 1867M (default from the AMI I am running) > I'm reading about 900k keys Ok, so basically a very significant portion of the data fits in page cache, but not all. > As I was just go

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:00 PM, Peter Schuller wrote: > What is your total data size (nodetool info/nodetool ring) per node, > your heap size, and the amount of memory on the system? > 2 Node cluster, 7.9GB of ram (ec2 m1.large) RF=2 11GB per node Quorum reads 122 million keys heap size is 1867

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> Yep, the readstage is backlogging consistently - but the thing I am trying > to explain s why it is good sometimes in an environment that is pretty well > controlled - other than being on ec2 So pending is constantly > 0? What are the clients? Is it batch jobs or something similar where there is

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> the servers spending >50% of the time in io-wait Note that I/O wait is not necessarily a good indicator, depending on situation. In particular if you have multiple drives, I/O wait can mostly be ignored. Similarly if you have non-trivial CPU usage in addition to disk I/O, it is also not a good i

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:51 PM, Peter Schuller wrote: > For one thing, what does ReadStage's pending look like if you > repeatedly run "nodetool tpstats" on these nodes? If you're simply > bottlenecking on I/O on reads, that is the most easy and direct way to > observe this empirically. If you'r

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:48 PM, Peter Schuller wrote: > > Yep - I've been looking at these - I don't see anything in iostat/dstat > etc > > that point strongly to a problem. There is quite a bit of I/O load, but > it > > looks roughly uniform on slow and fast instances of the queries. The last >

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
What is your total data size (nodetool info/nodetool ring) per node, your heap size, and the amount of memory on the system? -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:49 PM, Peter Schuller wrote: > > I'm making an assumption . . . I don't yet know enough about cassandra > to > > prove they are in the cache. I have my keycache set to 2 million, and am > > only querying ~900,000 keys. so after the first time I'm assuming they > are > >

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
For one thing, what does ReadStage's pending look like if you repeatedly run "nodetool tpstats" on these nodes? If you're simply bottlenecking on I/O on reads, that is the most easy and direct way to observe this empirically. If you're saturated, you'll see active close to maximum at all times, and

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> I'm making an assumption . . .  I don't yet know enough about cassandra to > prove they are in the cache. I have my keycache set to 2 million, and am > only querying ~900,000 keys. so after the first time I'm assuming they are > in the cache. Note that the key cache only caches the index positio

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> Yep - I've been looking at these - I don't see anything in iostat/dstat etc > that point strongly to a problem. There is quite a bit of I/O load, but it > looks roughly uniform on slow and fast instances of the queries. The last > compaction ran 4 days ago - which was before I started seeing vari

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
2012/2/13 R. Verlangen > I also noticed that, Cassandra appears to perform better under a continues > load. > > Are you sure the rows you're quering are actually in the cache? > I'm making an assumption . . . I don't yet know enough about cassandra to prove they are in the cache. I have my keyc

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:21 PM, Peter Schuller wrote: > > I actually has the opposite 'problem'. I have a pair of servers that have > > been static since mid last week, but have seen performance vary > > significantly (x10) for exactly the same query. I hypothesised it was > > various caches so

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
> I actually has the opposite 'problem'. I have a pair of servers that have > been static since mid last week, but have seen performance vary > significantly (x10) for exactly the same query. I hypothesised it was > various caches so I shut down Cassandra, flushed the O/S buffer cache and > then bo

Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
I also noticed that, Cassandra appears to perform better under a continues load. Are you sure the rows you're quering are actually in the cache? 2012/2/13 Franc Carter > 2012/2/13 R. Verlangen > >> This is because of the "warm up" of Cassandra as it starts. On a start it >> will start fetching

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
2012/2/13 R. Verlangen > This is because of the "warm up" of Cassandra as it starts. On a start it > will start fetching the rows that were cached: this will have to be loaded > from the disk, as there is nothing in the cache yet. You can read more > about this at http://wiki.apache.org/cassandr

Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
This is because of the "warm up" of Cassandra as it starts. On a start it will start fetching the rows that were cached: this will have to be loaded from the disk, as there is nothing in the cache yet. You can read more about this at http://wiki.apache.org/cassandra/LargeDataSetConsiderations 201