Re: key is sorted?

2010-05-12 Thread David Boxenhorn
Thank you. That is very good news. I can sort the results myself - what is important is that I get them! On Thu, May 13, 2010 at 2:42 AM, Vijay wrote: > If you use Random partitioner, You will *NOT* get RowKey's sorted. > (Columns are sorted always). > > Answer: If used Random partitioner > True

Re: how does cassandra compare with mongodb?

2010-05-12 Thread Jonathan Shook
You can choose to have keys ordered by using an OrderPreservingPartioner with the trade-off that key ranges can get denser on certain nodes than others. On Wed, May 12, 2010 at 7:48 PM, philip andrew wrote: > > Hi, > From my understanding, Cassandra entities are indexed on only one key, so > this

Re: Apache Cassandra and Net::Cassandra::Easy

2010-05-12 Thread Scott Doty
Well, ain't that a kick in the tush. :-/ I grabbed the svn trunk, but build failed, probably because Fedora 11 is too old for this bleeding edge stuff. Server is upgrading itself now, but I wondered: is anyone using an rpm-based distro for Net::Cassandra::Easy and the svn Cassandra? Thanks.

Re: how does cassandra compare with mongodb?

2010-05-12 Thread philip andrew
Hi, >From my understanding, Cassandra entities are indexed on only one key, so this can be a problem if you are searching for example by two values such as if you are storing an entity with a x,y then wish to search for entities in a box ie x>5 and x<10 and y>5 and y<10. MongoDB can do this, Cassa

Re: key is sorted?

2010-05-12 Thread Jonathan Shook
Although, if replication factor spans all nodes, then the disparity in row allocation should be a non-issue when using OrderPreservingPartitioner. On Wed, May 12, 2010 at 6:42 PM, Vijay wrote: > If you use Random partitioner, You will NOT get RowKey's sorted. (Columns > are sorted always). > Answ

Re: paging row keys

2010-05-12 Thread Nathan McCall
Oh, thanks to Andrey Panov for providing that example, btw. We are always looking for good usage examples to post on the Hector wiki If anyone else has them. -Nate On Wed, May 12, 2010 at 5:01 PM, Nathan McCall wrote: > Here is a basic example using get_range_slices to retrieve 500 rows via > he

Re: paging row keys

2010-05-12 Thread Nathan McCall
Here is a basic example using get_range_slices to retrieve 500 rows via hector: http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java To page, use the last key you got back as the start key. -Nate On Wed, May 12, 2010 at 3:37 PM, Core

Re: key is sorted?

2010-05-12 Thread Vijay
If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns are sorted always). Answer: If used Random partitioner True True Regards, On Wed, May 12, 2010 at 1:25 AM, David Boxenhorn wrote: > You do any kind of range slice, e.g. keys beginning with "abc"? But the > results w

paging row keys

2010-05-12 Thread Corey Hulen
Can someone point me to a thrift sample (preferable java) to list all the rows in a ColumnFamily for my Cassandra server. I noticed some examples using SlicePredicate and SliceRange to perform a similar query against the columns with paging, but I was looking for something similar for rows with pa

how does cassandra compare with mongodb?

2010-05-12 Thread S Ahmed
I tried searching mail-archive, but the search feature is a bit wacky (or more probably I don't know how to use it). What are the key differences between Cassandra and Mongodb? Is there a particular use case where each solution shines?

Re: Is it possible to delete records based upon where condition

2010-05-12 Thread Joel Pitt
On Thu, May 13, 2010 at 12:34 AM, Moses Dinakaran wrote: > I wanted to remove the records based upon the value of the column ses_tstamp > ie (delete from sessions where ses_tstamp between XXX & YYY OR delete from > session where ses_tstamp < XXX ) > > Is it possible to achieve this in Cassandra If

Re: Human readable Cassandra limitations

2010-05-12 Thread Paul Prescod
On Wed, May 12, 2010 at 2:02 AM, David Vanderfeesten wrote: >... > > My concern with the denormalization approach is that it shouldn't be managed > by the client side because this has big impact on your throughput.  Is the > map-reduce in that respect any better? > Wouldn't it be nice to support a

RE: Extremly slow inserts on LAN

2010-05-12 Thread Stephan Pfammatter
Hi, I read your post and noticed you are running Cassandra on win 2008. Do you run it in a production environment? I'm contacting you because there aren't that many windows installations. I need to provide a live Cassandra environment on win 2008 and was stumbling into some problems with node to

Re: Cache capacities set by nodetool

2010-05-12 Thread James Golick
Picked up out of config, I mean. On Wed, May 12, 2010 at 11:10 AM, James Golick wrote: > Hmmm that's definitely what we're seeing. Although, we aren't seeing > cache settings being picked up properly on a restart either. > > > On Wed, May 12, 2010 at 8:42 AM, Ryan King wrote: > >> It's a bug

Re: Cache capacities set by nodetool

2010-05-12 Thread James Golick
Hmmm that's definitely what we're seeing. Although, we aren't seeing cache settings being picked up properly on a restart either. On Wed, May 12, 2010 at 8:42 AM, Ryan King wrote: > It's a bug: > > https://issues.apache.org/jira/browse/CASSANDRA-1079 > > -ryan > > On Wed, May 12, 2010 at 8:1

Load Balancing Mapper Tasks

2010-05-12 Thread Joost Ouwerkerk
I've been trying to improve the time it takes to map 30 million rows using a hadoop / cassandra cluster with 30 nodes. I discovered that since CassandraInputFormat returns an ordered list of splits, when there are many splits (e.g. hundreds or more) the load on cassandra is horribly unbalanced. e

Re: Is it possible to delete records based upon where condition

2010-05-12 Thread Benjamin Black
The functionality of a WHERE clause usually means maintaining an inverted index, usually another CF, on the information of interest (ses_tstamp in your example). You then retrieve index rows from that CF to find the data rows. b On Wed, May 12, 2010 at 5:34 AM, Moses Dinakaran wrote: > Hi All,

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
On Wed, May 12, 2010 at 5:46 PM, Johan Oskarsson wrote: > Looking over the code this is in fact an issue in 0.6. > It's fixed in trunk/0.7. Connections will be reused and closed properly, see > https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. > > We can either backport that

Re: (Binary)Memtable flushing

2010-05-12 Thread Tobias Jungen
D'oh, forgot to search the JIRA on this one. Thanks Jonathan! On Wed, May 12, 2010 at 9:37 AM, Jonathan Ellis wrote: > https://issues.apache.org/jira/browse/CASSANDRA-856 > > On Tue, May 11, 2010 at 3:44 PM, Tobias Jungen > wrote: > > Yet another BMT question, thought this may apply for regular

Re: Real-time Web Analysis tool using Cassandra. Doubts...

2010-05-12 Thread Utku Can Topçu
What makes cassandra a poor choice is the fact that, you can't use a keyrange as input for the map phase for Hadoop. On Wed, May 12, 2010 at 4:37 PM, Jonathan Ellis wrote: > On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati > wrote: > > - First of all, my first thoughts is to have two CF o

Re: timeout while running simple hadoop job

2010-05-12 Thread Johan Oskarsson
Looking over the code this is in fact an issue in 0.6. It's fixed in trunk/0.7. Connections will be reused and closed properly, see https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. We can either backport that patch or make at least close the connections properly in 0.6. Ca

Re: Cache capacities set by nodetool

2010-05-12 Thread Ryan King
It's a bug: https://issues.apache.org/jira/browse/CASSANDRA-1079 -ryan On Wed, May 12, 2010 at 8:16 AM, James Golick wrote: > When I first brought this cluster online, the storage-conf.xml file had a > few cache capacities set. Since then, we've completely changed how we use > cassandra's cachi

Re: timeout while running simple hadoop job

2010-05-12 Thread Héctor Izquierdo
Have you checked your open file handler limit? You can do that by using "ulimit" in the shell. If it's too low, you will encounter the "too many open files" error. You can also see how many open handlers an application has with "lsof". Héctor Izquierdo On 12/05/10 17:00, gabriele renzi wrote:

Cache capacities set by nodetool

2010-05-12 Thread James Golick
When I first brought this cluster online, the storage-conf.xml file had a few cache capacities set. Since then, we've completely changed how we use cassandra's caching, and no longer use any of the caches I setup in the original configuration. I'm finding that cassandra doesn't want to keep my new

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
On Wed, May 12, 2010 at 4:43 PM, Jonathan Ellis wrote: > On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote: >> - is it possible that such errors show up on the client side as >> timeoutErrors when they could be reported better? > > No, if the node the client is talking to doesn't get a reply

Re: timeout while running simple hadoop job

2010-05-12 Thread Jonathan Ellis
On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote: > - is it possible that such errors show up on the client side as > timeoutErrors when they could be reported better? No, if the node the client is talking to doesn't get a reply from the data node, there is no way for it to magically find ou

Re: Unexpected TType error on web traffic increase

2010-05-12 Thread Jonathan Ellis
Sounds like the sort of error you'd see if you were using thread-unsafe Thrift clients on multiple threads. On Tue, May 11, 2010 at 11:23 PM, Waqas Badar wrote: > Dear all, > > We are using Cassandra on website. Whenever website traffic increases, we > got the following error (Python): > > File "

Re: Timed out reads still in queue

2010-05-12 Thread Jonathan Ellis
This is a slightly different way of describing https://issues.apache.org/jira/browse/CASSANDRA-685 On Tue, May 11, 2010 at 9:01 PM, Jeremy Dunck wrote: > Reddit posted a blog entry about some recent downtime, partially due > to issues with Cassandra. > http://blog.reddit.com/2010/05/reddits-may-2

Re: nodetool drain disables writes?

2010-05-12 Thread Jonathan Ellis
On Tue, May 11, 2010 at 4:18 PM, Anthony Molinaro wrote: > Hi, > >  I thought that 'nodetool drain' was supposed to flush the commit logs > through the system, which it appears to do (verified by running ls in > the commit log directory and seeing no files). > > However, it also appears to disable

Re: (Binary)Memtable flushing

2010-05-12 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-856 On Tue, May 11, 2010 at 3:44 PM, Tobias Jungen wrote: > Yet another BMT question, thought this may apply for regular memtables as > well... > > After doing a batch insert, I accidentally submitted the flush command > twice. To my surprise, the t

Re: Real-time Web Analysis tool using Cassandra. Doubts...

2010-05-12 Thread Jonathan Ellis
On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati wrote: > - First of all, my first thoughts is to have two CF one for raw client > request (~10 millions++ per day) and other for aggregated metrics in some > defined inteval time like 1min, 5min, 15min... Is this a good approach ? Sure. > - I

Re: what is DCQUORUM

2010-05-12 Thread vd
Thanks Eben On Wed, May 12, 2010 at 7:33 PM, Eben Hewitt wrote: > QUORUM is a high consistency level. It refers to the number of nodes that > have to acknowledge read or write operations in order to be assured that > Cassandra is in a consistent state. It uses / 2 + 1. > > DCQUORUM means "Data

Re: what is DCQUORUM

2010-05-12 Thread Eben Hewitt
QUORUM is a high consistency level. It refers to the number of nodes that have to acknowledge read or write operations in order to be assured that Cassandra is in a consistent state. It uses / 2 + 1. DCQUORUM means "Data Center Quorum", and balances consistency with performance. It puts multiple

what is DCQUORUM

2010-05-12 Thread vd
Hi I have read about QUORUM but lately came across DCQUORUM. What is it and whats the difference between the two ?

RE: Extremly slow inserts on LAN

2010-05-12 Thread Stephan Pfammatter
I don't understand all the problems yet you guys are facing. Just wanted to let you know that I'm getting my feet wet with Cassandra. In a few days/weeks I'll be re-reading all your notes again :) I'm bound to provide a production Cassandra environment based on win 2008 (I know, I know...It's b

Re: replication impact on write throughput

2010-05-12 Thread Mark Greene
>>If the replication factor is 2, then everything is written twice. So >>your throughput is cut in half. throughput of new inserts is cut in half right? I think I was thinking about capacity in more general terms from the node's perspective. The node has the ability to write so many operations per

Is it possible to delete records based upon where condition

2010-05-12 Thread Moses Dinakaran
Hi All, In Cassandra it possible to remove records based upon where condition. We are planning to move the session and cache table from MySql to Cassandra and where doing the fesability study. Everything seems to be Ok other than garbage collection of session table. Was not able to remove super

Terminology, getting slices by time

2010-05-12 Thread Leslie Viljoen
Hi! I am having trouble understanding the "column" terminology Cassandra uses. I am developing in Ruby. I need to store data for vehicles which will come in at different times and retrieve data for a specific vehicle for specific slices of time. So each record could look like: vehicle_id, { time

Re: what/how do you guys monitor "slow" nodes?

2010-05-12 Thread Ran Tavory
There is a per cf read and write latency jmx. On May 12, 2010 12:55 AM, "Jordan Pittier - Rezel" wrote: For sure you have to pay particular attention to memory allocation on each node, especially be sure your servers dont swap. Then you can monitor how load are balanced among your nodes (nodetoo

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi
a follow up for anyone that may end up on this conversation again: I kept trying and neither changing the number of concurrent map tasks, nor the slice size helped. Finally, I found out a screw up in our logging system, which had forbidden us from noticing a couple of recurring errors in the logs

Re: replication impact on write throughput

2010-05-12 Thread David Vanderfeesten
About this linear scaling of throughput(with keys perfectly distributed + requests balanced over all nodes): I would assume that this is not the case for small number of nodes because starting from 2 nodes onwards a part of the requests have to be handled by a proxy node + the actual node responsib

Re: Human readable Cassandra limitations

2010-05-12 Thread David Vanderfeesten
On the scaleability and performance side, I found Yahoo's paper about the YCSB project interesting (benchmarking some NoSQL solutions with MySQL). See research.yahoo.com/files/*ycsb*.*pdf. *My concern with the denormalization approach is that it shouldn't be managed by the client side because this

Re: key is sorted?

2010-05-12 Thread David Boxenhorn
You do any kind of range slice, e.g. keys beginning with "abc"? But the results will not be ordered? Please answer one of the following: True True True False False False Explain? Thanks! On Sun, May 9, 2010 at 8:27 PM, Vijay wrote: > True, The Range slice support was enabled in Random Partit