I don't understand paging through a table by primary key.

2014-05-29 Thread Kevin Burton
I'm trying to grok this but I can't figure it out in CQL world. I'd like to efficiently page through a table via primary key. This way I only involve one node at a time and the reads on disk are contiguous. I would have assumed it was a combination of > pk and order by but that doesn't seem to w

Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-29 Thread Kevin Burton
The general idea is that for HTML content, you want content from the same domain to be adjacent on disk. This way duplicate HTML template runs get compressed REALLY well. I think in our situations we would see exceptional compression. If we get closer to this I'll just implement snappy+bmdiff...

Re: Multi-DC Environment Question

2014-05-29 Thread Ben Bromhead
Short answer: If time elapsed > max_hint_window_in_ms then hints will stop being created. You will need to rely on your read consistency level, read repair and anti-entropy repair operations to restore consistency. Long answer: http://www.slideshare.net/jasedbrown/understanding-antientropy-in-

Re: Increased Cassandra connection latency

2014-05-29 Thread Alex Popescu
Also using the latest version of the driver (1.0.7) is always a good idea just to make sure you are not hitting issues that have already been addressed. On Thu, May 29, 2014 at 12:33 AM, Aaron Morton wrote: > You’ll need to provide some more information such as: > > * Do you have monitoring on

Re: Number of rows under one partition key

2014-05-29 Thread Paulo Ricardo Motta Gomes
Hey, We are considering upgrading from 1.2 to 2.0, why don't you consider 2.0 ready for production yet, Robert? Have you wrote about this somewhere already? A bit off-topic in this discussion but it would be interesting to know, your posts are generally very enlightening. Cheers, On Thu, May 2

Re: conditional delete consistency level/timeout

2014-05-29 Thread Robert Coli
On Fri, May 16, 2014 at 7:06 AM, Mohica Jasha wrote: > Earlier I reported the following bug against C* 2.0.5 > ... > It seems to be fixed in C* 2.0.7, but we are still seeing similar > suspicious timeouts. > ... > We noticed that DELETE queries against this table sometimes timeout: >

Re: Number of rows under one partition key

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 6:10 AM, Vegard Berget wrote: > I know this has been discussed before, and I know there are limitations to > how many rows one partition key in practice can handle. But I am not sure > if number of rows or total data is the deciding factor. > Both. In terms of data size,

Re: How long are expired values actually returned?

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 8:26 AM, Sebastian Schmidt wrote: > Thank you for your answer, I really appreciate that you want to help me. > But already found out that I did something wrong in my implementation. > Could you be more specific about the nature of the mistake you made, so people who might

Re: Multi-DC Environment Question

2014-05-29 Thread Tupshin Harper
When one node or DC is down, coordinator nodes being written through will notice this fact and store hints (hinted handoff is the mechanism), and those hints are used to send the data that was not able to be replicated initially. http://www.datastax.com/dev/blog/modern-hinted-handoff -Tupshin On

Multi-DC Environment Question

2014-05-29 Thread Vasileios Vlachos
Hello All, We have plans to add a second DC to our live Cassandra environment. Currently RF=3 and we read and write at QUORUM. After adding DC2 we are going to be reading and writing at LOCAL_QUORUM. If my understanding is correct, when a client sends a write request, if the consistency leve

Re: Anyone using Astyanax in production besides Netflix itself?

2014-05-29 Thread Tupshin Harper
While Astyanax 2.0 is still beta, I think you will find it provides a very good migration path from the 1.0 thrift based version to the 2.0 native driver version. Well worth considering if you like the Astyanax API and functionality. I know of multiple DataStax customers planning on using it. -

Re: Anyone using Astyanax in production besides Netflix itself?

2014-05-29 Thread Jacob Rhoden
Not long ago a vote was organised to get the developers to agree to stop work on the thrift API. New Cassandra features from this point are intended only for CQL. You probably want to make the effort to switch to CQL now rather than later. __ Sent from iPhone > On 3

Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread Paulo Ricardo Motta Gomes
Really informative thread, thank you! We had a secondary index trauma a while ago, and since then we knew it was not a good idea for most of the cases, but now it's even more clear why. On Thu, May 29, 2014 at 5:31 PM, Robert Coli wrote: > On Thu, May 29, 2014 at 1:08 PM, DuyHai Doan wrote: >

Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread Robert Coli
On Thu, May 29, 2014 at 1:08 PM, DuyHai Doan wrote: > Hello Robert > > There are some maths involved when considering the performance of > secondary index in C* > Yes, these are the maths which are behind my FIXMEs in the original post. I merely have not had time to explicitly describe them in

Re: binary protocol server side sockets

2014-05-29 Thread Eric Plowe
Michael, The ask is for letting keep alive be configurable for native transport, with Socket.setKeepAlive. By default, SO_KEEPALIVE is false ( http://docs.oracle.com/javase/7/docs/api/java/net/StandardSocketOptions.html#SO_KEEPALIVE). Regards, Eric Plowe On Wed, Apr 9, 2014 at 1:25 PM, Michae

Re: Erase old sstables to make room for new sstables

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 10:17 AM, Redmumba wrote: > Is this possible to do safely? The data in the oldest sstable is always > guaranteed to be the oldest data, so that is not my concern--my main > concern is whether or not we can even do this, and also how we can notify > Cassandra that an sstab

Re: Retrieve counter value after update

2014-05-29 Thread DuyHai Doan
Hello Ziju First, you can read this excellent blog post explaining how counters work under the hood: http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters Now, considering your request, you'd like Cassandra to return the current counter value on update.

Anyone using Astyanax in production besides Netflix itself?

2014-05-29 Thread user 01
What version of Astyanax(thrift based impl. or beta Java driver one?) are you using ? With what cassandra version ? Would you still recommend Astyanax at this point when DS Java Driver is out? My intentions are to use Astyanax over thrift based impl for now, & later switch to Astyanax over Java dr

Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread DuyHai Doan
Hello Robert There are some maths involved when considering the performance of secondary index in C* First, the current implementation is a distributed 2nd index, meaning that each node that contains actual data also contains the index data. So considering a cluster of *N* nodes with replicat

Re: Clustering order and secondary index

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 7:12 AM, cbert...@libero.it wrote: > I have an easy question for you all: query using only secondary indexes do > not > respect any clustering order? > It is a general property of secondary indexes in Cassandra that they are not in token order unless you are using an orde

Retrieve counter value after update

2014-05-29 Thread ziju feng
Hi All, I was wondering if there is a planned feature in Cassandra to return the current counter value after the update statement? Our project is using counter column to count and since counter column cannot reside in the same table with regular columns, we have to denormalize the counter value a

Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread Robert Coli
On Fri, May 16, 2014 at 10:53 AM, Kevin Burton wrote: > I'm struggling with cassandra secondary indexes since the documentation > seems all over the place and I'm having to put together everything from > blog posts. > This mostly-complete summary content will eventually make it into a blog post

Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-29 Thread Robert Coli
On Sat, May 17, 2014 at 10:25 PM, Kevin Burton wrote: > "compression" … sure.. but bmdiff? Not that I can find. BMDiff is an > algorithm that in some situations could result in 10x compression due > to the way it's able to find long commons runs. This is a pathological > case though. But i

Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread DuyHai Doan
Sure it can be done, you can submit them a pull request. I'm sure they'll be happy to merge it. On Thu, May 29, 2014 at 5:59 PM, user 01 wrote: > But won't it be nice if the API just provides a method to do so more > efficiently since it is easily possible? This is not a big deal for API. > > >

Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread user 01
But won't it be nice if the API just provides a method to do so more efficiently since it is easily possible? This is not a big deal for API. On Thu, May 29, 2014 at 9:11 PM, DuyHai Doan wrote: > "so if I need to add a same column to 1000 rows, it creates the column > object 100 times" --> is i

Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread DuyHai Doan
"so if I need to add a same column to 1000 rows, it creates the column object 100 times" --> is it really an issue ? Even if Astyanax creates 1 millions of column objects, as long as they die young and respect the generational hypothesis of the JVM, it's fine. On Thu, May 29, 2014 at 4:05 PM, use

Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread user 01
I am using Astyanax over thrift driver. On Thu, May 29, 2014 at 7:35 PM, user 01 wrote: > With Hector I used to create a column object once & add that to multiple > row mutations but with Astyanax it creates a new column object for each row > mutation in a mutation batch so if I need to add a s

With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread user 01
With Hector I used to create a column object once & add that to multiple row mutations but with Astyanax it creates a new column object for each row mutation in a mutation batch so if I need to add a same column to 1000 rows, it creates the column object 100 times. Isn't there a better way to add s

[RELEASE] Apache Cassandra 2.0.8 released

2014-05-29 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 2.0.8. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassand

Re: What are the advantages of static column family over a dynamic column family?

2014-05-29 Thread Jens Rantil
Hi user 01 (firstname and lastname?), I'll give you one technical answer and one related to modelling: Technical: Sure, you could really put all your data on a single row. The problem is it will simply not scale horizontally. More cassandra nodes will not make your cluster perform better and will

Re: Memory issue

2014-05-29 Thread Aaron Morton
> As soon as it starts, the JVM is get killed because of memory issue. What is the memory issue that gets kills the JVM ? The log message below is simply a warning > WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock > JVM memory (ENOMEM). > This can result in part of

Re: What % of cassandra developers are employed by Datastax?

2014-05-29 Thread Aaron Morton
> The Cassandra Summit Bootcamp, Sep 12-13, immediately following the Summit, > might be interesting for potential contributors. I’ll be there to help people get started. Looking forward to it. While DS are the biggest contributor in time and patches, there are several other well known people an

Re: Increased Cassandra connection latency

2014-05-29 Thread Aaron Morton
You’ll need to provide some more information such as: * Do you have monitoring on the cassandra cluster that shows the request latency ? Data Stax OpsCentre is good starting point. * Is compaction keeping up ? Check with nodetool compactionstats * Is the GCInspector logging about long runnin