Re: 3 node installation

2010-02-24 Thread Masood Mortazavi
Besides what I just said below, I should have also added that in the scenario discussed here: While RackUnawareStrategy is used ... Node B which seems to have a copy of all data at all times, has an IP address whose 3rd octet is different from IP addresses of both node A and C, which have the sam

Re: A configuration and step-by-step procedure for production deployment ...

2010-02-24 Thread Masood Mortazavi
On Wed, Feb 24, 2010 at 8:29 PM, Jonathan Ellis wrote: > On Wed, Feb 24, 2010 at 9:29 PM, Masood Mortazavi > wrote: > > Is there a configuration and step-by-step *procedure* for production > > deployments of Cassandra? > > Not really. As w/ any cluster deployment, some basic sysadmin kung fu >

Re: 3 node installation

2010-02-24 Thread Masood Mortazavi
Yes. Identical with replication factor of 2. m. On Wed, Feb 24, 2010 at 8:33 PM, Jonathan Ellis wrote: > Is the configuration identical on all nodes? Specifically, is > ReplicationFactor set to 2 on all nodes? > > On Wed, Feb 24, 2010 at 10:07 PM, Masood Mortazavi > wrote: > > I wonder if anyo

RE: Strategy to delete/expire keys in cassandra

2010-02-24 Thread Weijun Li
Hi Sylvain, I just noticed that you are the one that implemented the Expiring Column feature. Could you please help on my questions? Should I just run command (in Cassandra 0.5 source folder?) like: patch -p1 -i 0001-Add-new-ExpiringColumn-class.patch for all of the five patches in yo

Re: 3 node installation

2010-02-24 Thread Jonathan Ellis
Is the configuration identical on all nodes? Specifically, is ReplicationFactor set to 2 on all nodes? On Wed, Feb 24, 2010 at 10:07 PM, Masood Mortazavi wrote: > I wonder if anyone can provide an explanation for the following behavior > observed in a three-node cluster: > > 1. In a three-node (

Re: A configuration and step-by-step procedure for production deployment ...

2010-02-24 Thread Jonathan Ellis
On Wed, Feb 24, 2010 at 9:29 PM, Masood Mortazavi wrote: > Is there a configuration and step-by-step *procedure* for production > deployments of Cassandra? Not really. As w/ any cluster deployment, some basic sysadmin kung fu is required, and we don't go into that (although I suppose maybe we sh

Re: Understanding Bootstrapping

2010-02-24 Thread Jonathan Ellis
Bootstrap files are streamed directly to data locations as .tmp files and renamed when complete. One of the problems w/ 0.5's bootstrap is indeed that it doesn't give you any visibility into what is going on. This is addressed in 0.6 w/ additional JMX reporting. On Wed, Feb 24, 2010 at 5:06 PM,

Re: Adjusting Token Spaces and Rebalancing Data

2010-02-24 Thread Jonathan Ellis
nodeprobe loadbalance and/or nodeprobe move http://wiki.apache.org/cassandra/Operations On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham wrote: > Hello, > > I have 6 node Cassandra 0.5.0 cluster > using org.apache.cassandra.dht.OrderPreservingPartitioner with replication > factor 3. > > I mistakenly

Re: cassandra freezes

2010-02-24 Thread Jonathan Ellis
On Wed, Feb 24, 2010 at 8:46 PM, Santal Li wrote: > BTW: Somebody in my team told me, that if the cassandra managed data was too > huge( >15x than heap space) , will cause performance issues, is this true? It really has more to do with what your hot data set is, than absolute size. Once any syst

3 node installation

2010-02-24 Thread Masood Mortazavi
I wonder if anyone can provide an explanation for the following behavior observed in a three-node cluster: 1. In a three-node (A, B and C) installation, I use the cli, connected to node A, to set 10 data items. 2. On cli connected to node A, I do get, and can see all 10 data items. 3. I take nod

A configuration and step-by-step procedure for production deployment ...

2010-02-24 Thread Masood Mortazavi
Is there a configuration and step-by-step *procedure* for production deployments of Cassandra? By the way, I've noticed that not all potentially configurable setting may actually be included in the -- storage-config.xml -- that's distributed with the releases. [For example, there seems to be some

Re: cassandra freezes

2010-02-24 Thread Santal Li
Thank you, it's help. because I have about 150G data in each node, so I setup the Heap to 8 giga, just want to make cassandra have enought space to cache key index. I think reduce the heap size is valuable to try. Try to split one cassandra instance to 2 sub node, contains in one physical server,

Re: full text search

2010-02-24 Thread Mohammad Abed
You might want to keep an on the thread http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg02674.html Also somebody wrote Lucandra powers http://sparse.ly On Wed, Feb 24, 2010 at 5:00 PM, Brandon Williams wrote: > On Wed, Feb 24, 2010 at 6:45 PM, Mohammad Abed wrote: > >>

Re: full text search

2010-02-24 Thread Brandon Williams
On Wed, Feb 24, 2010 at 6:45 PM, Mohammad Abed wrote: > Either of these solutions used in any production environment? Lucandra powers http://sparse.ly -Brandon

Re: full text search

2010-02-24 Thread Matt Corgan
Quick question about Facebook's indexing strategy... based on the fact that all of the columns within a supercolumn must be serialized/deserialized together, and therefore fit in memory, is there a point at which individual Facebook users could start causing problems if they have a lot of messages?

Re: full text search

2010-02-24 Thread Nathan McCall
The following paper on the Articles and Presentations section of the Cassandra wiki describes Facebook's inbox search implementation: http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf -Nate On Wed, Feb 24, 2010 at 4:45 PM, Mohammad Abed wrote: > Either of these solutions

Re: full text search

2010-02-24 Thread Mohammad Abed
Either of these solutions used in any production environment? On Wed, Feb 24, 2010 at 3:54 PM, Brandon Williams wrote: > On Wed, Feb 24, 2010 at 5:41 PM, Mohammad Abed wrote: > >> Any suggestions on how to pursue full text search with Cassandra, what >> options are out there? >> > > Also: http

Adjusting Token Spaces and Rebalancing Data

2010-02-24 Thread Jon Graham
Hello, I have 6 node Cassandra 0.5.0 cluster using org.apache.cassandra.dht.OrderPreservingPartitioner with replication factor 3. I mistakenly set my tokens to the wrong values, and have all the data being stored on the first node (with replicas on the seconds and third nodes) Does Cassandra hav

Re: full text search

2010-02-24 Thread Brandon Williams
On Wed, Feb 24, 2010 at 5:41 PM, Mohammad Abed wrote: > Any suggestions on how to pursue full text search with Cassandra, what > options are out there? > Also: http://github.com/tjake/Lucandra -Brandon

Re: full text search

2010-02-24 Thread K Wong
http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/ On Wed, Feb 24, 2010 at 3:41 PM, Mohammad Abed wrote: > Any suggestions on how to pursue full text search with Cassandra, what > options are out there? > > Thanks. >

full text search

2010-02-24 Thread Mohammad Abed
Any suggestions on how to pursue full text search with Cassandra, what options are out there? Thanks.

Understanding Bootstrapping

2010-02-24 Thread Anthony Molinaro
Hi, I had to add a few more nodes to my cluster yesterday so far 2 of the 3 have "finished" bootstrapping (at least as far as I can tell, the show up via a ring command in the UP state, the 3rd does not show up at all in the ring command). I'm curious when the 3rd will finish, so was wondering

Re: Wiki permission denied

2010-02-24 Thread Jonathan Ellis
pinged #asfinfra. looks like they fixed it. On Wed, Feb 24, 2010 at 11:09 AM, Mark Robson wrote: > Hiya, > > I'm looking at > > http://wiki.apache.org/cassandra/RecentChanges > > And there's an error. > > Can someone look into it please? > > Ta > > Mark >

Re: Deleted rows showing up when doing a get_range_slice query

2010-02-24 Thread Jonathan Ellis
On Wed, Feb 24, 2010 at 3:21 PM, Erik Holstad wrote: > When deleting rows from a table and then using a get_range_slice query, the > keys or the > deleted rows show up, with no name/value pairs. What is the reasoning behind > this? Cassandra doesn't know that there are no other columns associated

Deleted rows showing up when doing a get_range_slice query

2010-02-24 Thread Erik Holstad
When deleting rows from a table and then using a get_range_slice query, the keys or the deleted rows show up, with no name/value pairs. What is the reasoning behind this? I have also seen a weird issue when using a md5 generated byte[] as a column name, doesn't seem like it actually work. I can't

Re: import data into cassandra

2010-02-24 Thread Eric Evans
On Wed, 2010-02-24 at 18:43 +0100, Martin Probst wrote: > host:/opt/cassandra# bin/json2sstable -K Keyspace1 -c > col1 ../utf8_cassandra.json data/Keyspace1/col1-2-Data.db > Exception in thread "main" java.lang.NumberFormatException: For input > string: "PR" > at > java.lang.NumberFormatEx

Re: Bulk Ingestion Issues

2010-02-24 Thread Jonathan Ellis
the exception is unrelated, it's from the network layer (and is gone in 0.6) On Wed, Feb 24, 2010 at 11:54 AM, Sonny Heer wrote: > Sorry for being unclear.  Yes, I have flushed and compacted the data > in that keyspace.  I'm still not getting all the results expected. > Any idea where that except

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-02-24 Thread Jonathan Ellis
as you noticed, "nodeprobe move" first unloads the data, then moves to the new position. so that won't help you here. If you are using replicationfactor=1, scp the data to the previous node on the ring, then reduce the original node's token so it isn't responsible for so much, and run cleanup. (

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-02-24 Thread shiv shivaji
According to the stack trace I get in the log, it makes it look like the patch was for anti-compaction but I did not look at the source code in detail yet. java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException: disk full at java.util.concurrent.FutureTask$Sync.

Re: Bulk Ingestion Issues

2010-02-24 Thread Sonny Heer
Sorry for being unclear. Yes, I have flushed and compacted the data in that keyspace. I'm still not getting all the results expected. Any idea where that exception is about? On Wed, Feb 24, 2010 at 9:50 AM, Jonathan Ellis wrote: > Okay, so you are using binarymemtable, that wasn't 100% clear. >

Re: import data into cassandra

2010-02-24 Thread Jonathan Ellis
I suggest getting it working via plain thrift calls before trying anything fancy. Otherwise it's probably premature optimization. On Wed, Feb 24, 2010 at 11:43 AM, Martin Probst wrote: > Hi, > > i'm playing around a little bit with cassandra and trying to load some data > into it. I've found th

Re: Bulk Ingestion Issues

2010-02-24 Thread Jonathan Ellis
Okay, so you are using binarymemtable, that wasn't 100% clear. With BMT you need to manually flush when you are done loading, the data isn't live until it's been converted to sstable. On Wed, Feb 24, 2010 at 11:45 AM, Sonny Heer wrote: >> >> On what symptom are you basing that conclusion? >> > >

Re: Bulk Ingestion Issues

2010-02-24 Thread Sonny Heer
> > On what symptom are you basing that conclusion? > I've ingested the same data using the java thrift API, ran queries against that set, and I'm getting different results when I ingest it using the StorageService (CassandraBulkLoader without Hadoop) method. The size of results is much less. The

import data into cassandra

2010-02-24 Thread Martin Probst
Hi, i'm playing around a little bit with cassandra and trying to load some data into it. I've found the sstable2json and json2sstable scripts inside the /bin dir and tried to work with this scripts. I've wrote a wrapper which transform csv's into a json file and the json-validator throws no fai

Re: Bulk Ingestion Issues

2010-02-24 Thread Jonathan Ellis
On Wed, Feb 24, 2010 at 11:30 AM, Sonny Heer wrote: > I have a single box, and trying to ingest some data into a single > keyspace and 5 CFs.  Basically it reads from a directory text files, > and inserts into Cassandra.  I've set the BinaryMemtableSizeInMB to > 64. For some reason I'm not getting

Bulk Ingestion Issues

2010-02-24 Thread Sonny Heer
I have a single box, and trying to ingest some data into a single keyspace and 5 CFs. Basically it reads from a directory text files, and inserts into Cassandra. I've set the BinaryMemtableSizeInMB to 64. For some reason I'm not getting all my data into cassandra. I get some ingested, but very l

Wiki permission denied

2010-02-24 Thread Mark Robson
Hiya, I'm looking at http://wiki.apache.org/cassandra/RecentChanges And there's an error. Can someone look into it please? Ta Mark

Re: Cassandra paging, gathering stats

2010-02-24 Thread Jonathan Ellis
It does not. Someone would need it badly enough to code it first. :) On Wed, Feb 24, 2010 at 10:26 AM, Wojciech Kaczmarek wrote: > Btw, > > does get_range_slice support reversed=true for keys (not column > predicates) ? In 0.5 seems not > > On Tue, Feb 23, 2010 at 21:28, Jonathan Ellis wrote: >

Re: Cassandra paging, gathering stats

2010-02-24 Thread Wojciech Kaczmarek
Btw, does get_range_slice support reversed=true for keys (not column predicates) ? In 0.5 seems not On Tue, Feb 23, 2010 at 21:28, Jonathan Ellis wrote: > you'd actually use first column as start, empty finish, > count=pagesize, and reversed=True, unless I'm misunderstanding > something. > > On

Re: Getting the keys in your system?

2010-02-24 Thread Erik Holstad
Haha! Yeah, fortunately we are only in the testing phase so this is not that big of a deal. Thanks a lot! -- Regards Erik

Re: Getting the keys in your system?

2010-02-24 Thread Jonathan Ellis
Other than "you'll have to completely reload all your data when changing partitioners," no, not much to think about. :) On Wed, Feb 24, 2010 at 9:38 AM, Erik Holstad wrote: > Thanks Jonathan! > We are thinking about moving over to the OPP to be able to be able to do > this > and to use an md5 for

Re: Getting the keys in your system?

2010-02-24 Thread Erik Holstad
Thanks Jonathan! We are thinking about moving over to the OPP to be able to be able to do this and to use an md5 for some of the data just to get the data written to different nodes for some of the cases where order is not really needed. Is there anything we need to think about when making the swi

Re: Getting the keys in your system?

2010-02-24 Thread Jonathan Ellis
0.6 adds hadoop support for exactly this scenario (among others). You can also use get_range_slice to iterate all keys against RP in 0.6, but it will be slow since it is difficult to parallelize manually. -Jonathan On Wed, Feb 24, 2010 at 9:23 AM, Erik Holstad wrote: > If you have a system setu

Getting the keys in your system?

2010-02-24 Thread Erik Holstad
If you have a system setup using the RandomPartitioner and have a couple of indexes setup for your data but realize that you need to add another index. How do you get the keys for your data, so that you can know where to point your indexes? I guess what I'm really asking is, is there a way to get y

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-02-24 Thread Jonathan Ellis
The patch you refer to was to help *compaction*, not *anticompaction*. If the space is mostly hints for other machines (is that what you meant by "due to past problems with others?") you should run nodeprobe cleanup on it to remove data that doesn't actually belong on that node. -Jonathan On Wed

Re: reads are slow

2010-02-24 Thread Jonathan Ellis
only the total row size limit (must fit in memory during compaction) On Wed, Feb 24, 2010 at 7:47 AM, kevin wrote: > On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis wrote: >> >> the standard workaround is to change your data model to use non-super >> columns instead. >> >> supercolumns are real

Re: Help for choice

2010-02-24 Thread alex kamil
Cemal, I've found the following analysis very helpful, it compares various noSQL options and gives pros/cons of RDBMS vs noSQL: "No Relation: The Mixed Blessings of Non-Relational Databases" by Ian Varley http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf -Alex

Re: reads are slow

2010-02-24 Thread kevin
On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis wrote: > the standard workaround is to change your data model to use non-super > columns instead. > > supercolumns are really only for relatively small numbers of > subcolumns until 598 is addressed. > is there any limit on the number of supercolum

Re: Help for choice

2010-02-24 Thread Francois Orsini
Chris's answer of MySQL does make a lot of sense, indeed. Based on the data you provided - 5-6 millions rows is not considered as a very large database. - 1,000 row updates per minute (even with 4 indexes) should not be a problem for sure. You can actually achieve 1.5-2k updates "per sec" easily

Re: Help for choice

2010-02-24 Thread Nathan McCall
I found the following helpful: http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/ http://00f.net/2009/an-overview-of-modern-sql-free-databases/comments/507 http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext There is enough variation in th

Anti-compaction Diskspace issue even when latest patch applied

2010-02-24 Thread shiv shivaji
For about 6TB of total data size with a replication factor of 2 (6TB x 2) on a five node cluster, I see about 4.6 TB on one machine (due to potential past problems with other machines). The machine has a disk of 6TB. The data folder on this machine has 59,289 files totally 4.6 TB. The files ar

Re: Help for choice

2010-02-24 Thread Cemal
Hi, Maybe I have to tell that we are very eager to evaluate NoSQL approaches and for a simple case we want evaluate and compare each approaches. In our case actually our data has not been denormalized yet and we are suffering from a lot of joins. And because of very much updates in joined tables

Re: Help for choice

2010-02-24 Thread Nathan McCall
The workload you originally described does not sound like a difficult job for a relational database. Do you have any more information on the specifics of your access patterns and where you feel that an RDBMS might fall short? -Nate On Tue, Feb 23, 2010 at 11:27 PM, Cemal wrote: > I was not reall