date:20100512

Re: key is sorted?

2010-05-12 Thread David Boxenhorn

Thank you. That is very good news. I can sort the results myself - what is important is that I get them! On Thu, May 13, 2010 at 2:42 AM, Vijay wrote: > If you use Random partitioner, You will *NOT* get RowKey's sorted. > (Columns are sorted always). > > Answer: If used Random partitioner > True

Re: how does cassandra compare with mongodb?

2010-05-12 Thread Jonathan Shook

You can choose to have keys ordered by using an OrderPreservingPartioner with the trade-off that key ranges can get denser on certain nodes than others. On Wed, May 12, 2010 at 7:48 PM, philip andrew wrote: > > Hi, > From my understanding, Cassandra entities are indexed on only one key, so > this

Re: Apache Cassandra and Net::Cassandra::Easy

2010-05-12 Thread Scott Doty

Well, ain't that a kick in the tush. :-/ I grabbed the svn trunk, but build failed, probably because Fedora 11 is too old for this bleeding edge stuff. Server is upgrading itself now, but I wondered: is anyone using an rpm-based distro for Net::Cassandra::Easy and the svn Cassandra? Thanks.

Re: how does cassandra compare with mongodb?

2010-05-12 Thread philip andrew

Hi, >From my understanding, Cassandra entities are indexed on only one key, so this can be a problem if you are searching for example by two values such as if you are storing an entity with a x,y then wish to search for entities in a box ie x>5 and x<10 and y>5 and y<10. MongoDB can do this, Cassa

Re: key is sorted?

2010-05-12 Thread Jonathan Shook

Although, if replication factor spans all nodes, then the disparity in row allocation should be a non-issue when using OrderPreservingPartitioner. On Wed, May 12, 2010 at 6:42 PM, Vijay wrote: > If you use Random partitioner, You will NOT get RowKey's sorted. (Columns > are sorted always). > Answ

Re: paging row keys

2010-05-12 Thread Nathan McCall

Oh, thanks to Andrey Panov for providing that example, btw. We are always looking for good usage examples to post on the Hector wiki If anyone else has them. -Nate On Wed, May 12, 2010 at 5:01 PM, Nathan McCall wrote: > Here is a basic example using get_range_slices to retrieve 500 rows via > he

Re: paging row keys

2010-05-12 Thread Nathan McCall

Here is a basic example using get_range_slices to retrieve 500 rows via hector: http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java To page, use the last key you got back as the start key. -Nate On Wed, May 12, 2010 at 3:37 PM, Core

Re: key is sorted?

2010-05-12 Thread Vijay

If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns are sorted always). Answer: If used Random partitioner True True Regards, On Wed, May 12, 2010 at 1:25 AM, David Boxenhorn wrote: > You do any kind of range slice, e.g. keys beginning with "abc"? But the > results w

paging row keys

2010-05-12 Thread Corey Hulen

Can someone point me to a thrift sample (preferable java) to list all the rows in a ColumnFamily for my Cassandra server. I noticed some examples using SlicePredicate and SliceRange to perform a similar query against the columns with paging, but I was looking for something similar for rows with pa

how does cassandra compare with mongodb?

2010-05-12 Thread S Ahmed

I tried searching mail-archive, but the search feature is a bit wacky (or more probably I don't know how to use it). What are the key differences between Cassandra and Mongodb? Is there a particular use case where each solution shines?

Re: Is it possible to delete records based upon where condition

2010-05-12 Thread Joel Pitt

On Thu, May 13, 2010 at 12:34 AM, Moses Dinakaran wrote: > I wanted to remove the records based upon the value of the column ses_tstamp > ie (delete from sessions where ses_tstamp between XXX & YYY OR delete from > session where ses_tstamp < XXX ) > > Is it possible to achieve this in Cassandra If

Re: Human readable Cassandra limitations

2010-05-12 Thread Paul Prescod

On Wed, May 12, 2010 at 2:02 AM, David Vanderfeesten wrote: >... > > My concern with the denormalization approach is that it shouldn't be managed > by the client side because this has big impact on your throughput. Is the > map-reduce in that respect any better? > Wouldn't it be nice to support a

RE: Extremly slow inserts on LAN

2010-05-12 Thread Stephan Pfammatter

Hi, I read your post and noticed you are running Cassandra on win 2008. Do you run it in a production environment? I'm contacting you because there aren't that many windows installations. I need to provide a live Cassandra environment on win 2008 and was stumbling into some problems with node to

Re: Cache capacities set by nodetool

2010-05-12 Thread James Golick

Picked up out of config, I mean. On Wed, May 12, 2010 at 11:10 AM, James Golick wrote: > Hmmm that's definitely what we're seeing. Although, we aren't seeing > cache settings being picked up properly on a restart either. > > > On Wed, May 12, 2010 at 8:42 AM, Ryan King wrote: > >> It's a bug

Re: Cache capacities set by nodetool

2010-05-12 Thread James Golick

Hmmm that's definitely what we're seeing. Although, we aren't seeing cache settings being picked up properly on a restart either. On Wed, May 12, 2010 at 8:42 AM, Ryan King wrote: > It's a bug: > > https://issues.apache.org/jira/browse/CASSANDRA-1079 > > -ryan > > On Wed, May 12, 2010 at 8:1

Load Balancing Mapper Tasks

2010-05-12 Thread Joost Ouwerkerk

I've been trying to improve the time it takes to map 30 million rows using a hadoop / cassandra cluster with 30 nodes. I discovered that since CassandraInputFormat returns an ordered list of splits, when there are many splits (e.g. hundreds or more) the load on cassandra is horribly unbalanced. e

Re: Is it possible to delete records based upon where condition

2010-05-12 Thread Benjamin Black

The functionality of a WHERE clause usually means maintaining an inverted index, usually another CF, on the information of interest (ses_tstamp in your example). You then retrieve index rows from that CF to find the data rows. b On Wed, May 12, 2010 at 5:34 AM, Moses Dinakaran wrote: > Hi All,

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi

On Wed, May 12, 2010 at 5:46 PM, Johan Oskarsson wrote: > Looking over the code this is in fact an issue in 0.6. > It's fixed in trunk/0.7. Connections will be reused and closed properly, see > https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. > > We can either backport that

Re: (Binary)Memtable flushing

2010-05-12 Thread Tobias Jungen

D'oh, forgot to search the JIRA on this one. Thanks Jonathan! On Wed, May 12, 2010 at 9:37 AM, Jonathan Ellis wrote: > https://issues.apache.org/jira/browse/CASSANDRA-856 > > On Tue, May 11, 2010 at 3:44 PM, Tobias Jungen > wrote: > > Yet another BMT question, thought this may apply for regular

Re: Real-time Web Analysis tool using Cassandra. Doubts...

2010-05-12 Thread Utku Can Topçu

What makes cassandra a poor choice is the fact that, you can't use a keyrange as input for the map phase for Hadoop. On Wed, May 12, 2010 at 4:37 PM, Jonathan Ellis wrote: > On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati > wrote: > > - First of all, my first thoughts is to have two CF o

Re: timeout while running simple hadoop job

2010-05-12 Thread Johan Oskarsson

Looking over the code this is in fact an issue in 0.6. It's fixed in trunk/0.7. Connections will be reused and closed properly, see https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details. We can either backport that patch or make at least close the connections properly in 0.6. Ca

Re: Cache capacities set by nodetool

2010-05-12 Thread Ryan King

It's a bug: https://issues.apache.org/jira/browse/CASSANDRA-1079 -ryan On Wed, May 12, 2010 at 8:16 AM, James Golick wrote: > When I first brought this cluster online, the storage-conf.xml file had a > few cache capacities set. Since then, we've completely changed how we use > cassandra's cachi

Re: timeout while running simple hadoop job

2010-05-12 Thread Héctor Izquierdo

Have you checked your open file handler limit? You can do that by using "ulimit" in the shell. If it's too low, you will encounter the "too many open files" error. You can also see how many open handlers an application has with "lsof". Héctor Izquierdo On 12/05/10 17:00, gabriele renzi wrote:

Cache capacities set by nodetool

2010-05-12 Thread James Golick

When I first brought this cluster online, the storage-conf.xml file had a few cache capacities set. Since then, we've completely changed how we use cassandra's caching, and no longer use any of the caches I setup in the original configuration. I'm finding that cassandra doesn't want to keep my new

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi

On Wed, May 12, 2010 at 4:43 PM, Jonathan Ellis wrote: > On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote: >> - is it possible that such errors show up on the client side as >> timeoutErrors when they could be reported better? > > No, if the node the client is talking to doesn't get a reply

Re: timeout while running simple hadoop job

2010-05-12 Thread Jonathan Ellis

On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote: > - is it possible that such errors show up on the client side as > timeoutErrors when they could be reported better? No, if the node the client is talking to doesn't get a reply from the data node, there is no way for it to magically find ou

Re: Unexpected TType error on web traffic increase

2010-05-12 Thread Jonathan Ellis

Sounds like the sort of error you'd see if you were using thread-unsafe Thrift clients on multiple threads. On Tue, May 11, 2010 at 11:23 PM, Waqas Badar wrote: > Dear all, > > We are using Cassandra on website. Whenever website traffic increases, we > got the following error (Python): > > File "

Re: Timed out reads still in queue

2010-05-12 Thread Jonathan Ellis

This is a slightly different way of describing https://issues.apache.org/jira/browse/CASSANDRA-685 On Tue, May 11, 2010 at 9:01 PM, Jeremy Dunck wrote: > Reddit posted a blog entry about some recent downtime, partially due > to issues with Cassandra. > http://blog.reddit.com/2010/05/reddits-may-2

Re: nodetool drain disables writes?

2010-05-12 Thread Jonathan Ellis

On Tue, May 11, 2010 at 4:18 PM, Anthony Molinaro wrote: > Hi, > > I thought that 'nodetool drain' was supposed to flush the commit logs > through the system, which it appears to do (verified by running ls in > the commit log directory and seeing no files). > > However, it also appears to disable

Re: (Binary)Memtable flushing

2010-05-12 Thread Jonathan Ellis

https://issues.apache.org/jira/browse/CASSANDRA-856 On Tue, May 11, 2010 at 3:44 PM, Tobias Jungen wrote: > Yet another BMT question, thought this may apply for regular memtables as > well... > > After doing a batch insert, I accidentally submitted the flush command > twice. To my surprise, the t

Re: Real-time Web Analysis tool using Cassandra. Doubts...

2010-05-12 Thread Jonathan Ellis

On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati wrote: > - First of all, my first thoughts is to have two CF one for raw client > request (~10 millions++ per day) and other for aggregated metrics in some > defined inteval time like 1min, 5min, 15min... Is this a good approach ? Sure. > - I

Re: what is DCQUORUM

2010-05-12 Thread vd

Thanks Eben On Wed, May 12, 2010 at 7:33 PM, Eben Hewitt wrote: > QUORUM is a high consistency level. It refers to the number of nodes that > have to acknowledge read or write operations in order to be assured that > Cassandra is in a consistent state. It uses / 2 + 1. > > DCQUORUM means "Data

Re: what is DCQUORUM

2010-05-12 Thread Eben Hewitt

QUORUM is a high consistency level. It refers to the number of nodes that have to acknowledge read or write operations in order to be assured that Cassandra is in a consistent state. It uses / 2 + 1. DCQUORUM means "Data Center Quorum", and balances consistency with performance. It puts multiple

what is DCQUORUM

2010-05-12 Thread vd

Hi I have read about QUORUM but lately came across DCQUORUM. What is it and whats the difference between the two ?

RE: Extremly slow inserts on LAN

2010-05-12 Thread Stephan Pfammatter

I don't understand all the problems yet you guys are facing. Just wanted to let you know that I'm getting my feet wet with Cassandra. In a few days/weeks I'll be re-reading all your notes again :) I'm bound to provide a production Cassandra environment based on win 2008 (I know, I know...It's b

Re: replication impact on write throughput

2010-05-12 Thread Mark Greene

>>If the replication factor is 2, then everything is written twice. So >>your throughput is cut in half. throughput of new inserts is cut in half right? I think I was thinking about capacity in more general terms from the node's perspective. The node has the ability to write so many operations per

Is it possible to delete records based upon where condition

2010-05-12 Thread Moses Dinakaran

Hi All, In Cassandra it possible to remove records based upon where condition. We are planning to move the session and cache table from MySql to Cassandra and where doing the fesability study. Everything seems to be Ok other than garbage collection of session table. Was not able to remove super

Terminology, getting slices by time

2010-05-12 Thread Leslie Viljoen

Hi! I am having trouble understanding the "column" terminology Cassandra uses. I am developing in Ruby. I need to store data for vehicles which will come in at different times and retrieve data for a specific vehicle for specific slices of time. So each record could look like: vehicle_id, { time

Re: what/how do you guys monitor "slow" nodes?

2010-05-12 Thread Ran Tavory

There is a per cf read and write latency jmx. On May 12, 2010 12:55 AM, "Jordan Pittier - Rezel" wrote: For sure you have to pay particular attention to memory allocation on each node, especially be sure your servers dont swap. Then you can monitor how load are balanced among your nodes (nodetoo

Re: timeout while running simple hadoop job

2010-05-12 Thread gabriele renzi

a follow up for anyone that may end up on this conversation again: I kept trying and neither changing the number of concurrent map tasks, nor the slice size helped. Finally, I found out a screw up in our logging system, which had forbidden us from noticing a couple of recurring errors in the logs

Re: replication impact on write throughput

2010-05-12 Thread David Vanderfeesten

About this linear scaling of throughput(with keys perfectly distributed + requests balanced over all nodes): I would assume that this is not the case for small number of nodes because starting from 2 nodes onwards a part of the requests have to be handled by a proxy node + the actual node responsib

Re: Human readable Cassandra limitations

2010-05-12 Thread David Vanderfeesten

On the scaleability and performance side, I found Yahoo's paper about the YCSB project interesting (benchmarking some NoSQL solutions with MySQL). See research.yahoo.com/files/*ycsb*.*pdf. *My concern with the denormalization approach is that it shouldn't be managed by the client side because this

Re: key is sorted?

2010-05-12 Thread David Boxenhorn

You do any kind of range slice, e.g. keys beginning with "abc"? But the results will not be ordered? Please answer one of the following: True True True False False False Explain? Thanks! On Sun, May 9, 2010 at 8:27 PM, Vijay wrote: > True, The Range slice support was enabled in Random Partit

43 matches

Mail list logo