Re: Non-latin implementation

2011-02-25 Thread Sasha Dolgy
Hi AJ, I am storing simplified chinese data in columns without any issues at the moment 萨莎 I can retrieve the data, but haven't tried secondary indexes or something a bit more advanced yet -sd On Thu, Feb 24, 2011 at 5:21 PM, A J s5a...@gmail.com wrote: Hello, Have there been Cassandra

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Oleg Anastasyev
Sylvain Lebresne sylvain at datastax.com writes: However, if that simple conflict detection/resolution mechanism is not good enough for some of your use case and you need to keep two concurrent updates, it is easy enough. Just make sure that the update don't end up in the same column. This

Running multiple compactions concurrently

2011-02-25 Thread Daniel Josefsson
We experienced the java.lang.NegativeArraySizeException when upgrading to 0.7.2 in staging. The proposed solution (running compaction) seems to have solved this. However it took a lot of time to run. Is it safe to invoke a major compaction on all of the machines concurrently? I can't see a

Re: Fill disks more than 50%

2011-02-25 Thread Terje Marthinussen
I am suggesting that your probably want to rethink your scheme design since partitioning by year is going to be bad performance since the old servers are going to be nothing more then expensive tape drives. You fail to see the obvious It is just the fact that most of the data is stale

Re: Fill disks more than 50%

2011-02-25 Thread Terje Marthinussen
@Thibaut Britz Caveat:Using simple strategy. This works because cassandra scans data at startup and then serves what it finds. For a join for example you can rsync all the data from the node below/to the right of where the new node is joining. Then join without bootstrap then cleanup both

Re: Running multiple compactions concurrently

2011-02-25 Thread Gary Dusbabek
If your cluster has the overall IO capacity to perform a simultaneous compaction on every node and still adequately service reads and writes, then yes. If you're concerned about availability, your best bet will be to stagger the compactions. Gary. On Fri, Feb 25, 2011 at 04:24, Daniel

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread A J
He has a product to sell, so you can expect some advertising. But in general, Stonebraker's articles are very deep (another one that challenges general conceptions is http://voltdb.com/voltdb-webinar-sql-urban-myths ) . He is the creator of Postgres and considered a guru in databases by many. And

Re: Changing comparators

2011-02-25 Thread Jonathan Ellis
Compaction assumes that the sstables it has as input are ordered correctly (otherwise it would have to read the full row into memory to re-sort). So it would have to be a new operation, and not feasible in general for larger-than-memory rows. I don't think we'll ever add this. On Wed, Feb 23,

Re: A simple script that creates multi node clusters on a single machine.

2011-02-25 Thread Jonathan Ellis
Nice! On Wed, Feb 23, 2011 at 9:06 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On the mailing list and IRC there are many questions about Cassandra internals. I understand where the questions are coming from because it took me a while to get a grip on it. However if you have a laptop

Re: losing connection to Cassandra

2011-02-25 Thread Jonathan Ellis
You should upgrade before wasting time troubleshooting such an old install. On Thu, Feb 24, 2011 at 8:45 AM, Tomer B tomer...@gmail.com wrote: Hi i'm using a 3 node cluster of cassandra 0.6.1 together with hector as api to java client. every few days I get a situation where I cannot connect

Re: Exception in thread main java.lang.NoClassDefFoundError

2011-02-25 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/RunningCassandra may be useful, but really you should be using the debian package: http://wiki.apache.org/cassandra/DebianPackaging 2011/2/24 ko...@vivinavi.com ko...@vivinavi.com: Hi everyone I am new to JAVA and Cassandra. I just get started to install

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jonathan Ellis
That article is heavily biased by I am selling a competitor to Cassandra. First, read Coda's original piece if you haven't: http://codahale.com/you-cant-sacrifice-partition-tolerance/ Then, Jeff Darcy's response: http://pl.atyp.us/wordpress/?p=3110 On Thu, Feb 24, 2011 at 2:56 PM, A J

Re: Fill disks more than 50%

2011-02-25 Thread Edward Capriolo
On Fri, Feb 25, 2011 at 7:38 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: @Thibaut Britz Caveat:Using simple strategy. This works because cassandra scans data at startup and then serves what it finds. For a join for example you can rsync all the data from the node below/to the right

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread A J
Though you are not really implying that, I am not selling anything. I don't work for VoltDB. I had other issues for my use case with the software when I was evaluating it (their claim of durability is weak according to me. Though it does not matter I'd rather they call themselves NOSQL. they just

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jeremy Hanna
Yeah - no worries - I don't think anyone was thinking you were trying to drink kool-aid or selling anything. Jonathan was just pointing out thoughtful replies to his claims. This past year, Michael Stonebraker with voltdb and other things seems to have tried to take advantage of momentum

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jeremy Hanna
And everyone has a bias - and I think most people working with any of these solutions realizes that. I think it's interesting how many organizations use multiple data storage solutions versus just using one as they have different capabilities - like the recent Netflix news about using

2x storage

2011-02-25 Thread A J
I read in some cassandra notes that each node should be allocated twice the storage capacity you wish it to contain. I think the reason was during compaction another copy of SSTables have to be made before the original ones are discarded. Can someone confirm if that is actually true ? During

Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 9:22 AM, A J s5a...@gmail.com wrote: I read in some cassandra notes that each node should be allocated twice the storage capacity you wish it to contain. I think the reason was during compaction another copy of SSTables have to be made before the original ones are

Re: cassandra as user-profile data store

2011-02-25 Thread Tyler Hobbs
I'm wondering if anyone has used cassandra as a datastore for a user-profile service. I'm thinking of applications like behavioral targeting, where there are lots lots of users (10s to 100s of millions), and lots lots of data about them intermixed in, say, weblogs (probably TBs worth).

Acunu beta

2011-02-25 Thread Tim Moreton
I wanted to let everyone know that we're expanding our beta for the Acunu Storage Platform, which comprises a modified version of Cassandra that interfaces directly on to a storage stack reengineered for Big Data workloads. Acunu runs Cassandra applications unmodified, but provides (as we'll be

Re: 2x storage

2011-02-25 Thread A J
OK. Is it also driven by type of compaction ? Does a minor compaction require less working space than major compaction ? On Fri, Feb 25, 2011 at 12:40 PM, Robert Coli rc...@digg.com wrote: On Fri, Feb 25, 2011 at 9:22 AM, A J s5a...@gmail.com wrote: I read in some cassandra notes that each node

Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ron Siemens
I updated the cassandra version in the hector package from 7.0 to 7.2. The occasional slow-down in the CF-index went away. I then upped the heap to 512MB, and the secondary-indexing then works. Seems awfully memory hungry for my small dataset. Even the CF-index was faster with more heap.

Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 10:14 AM, A J s5a...@gmail.com wrote: OK. Is it also driven by type of compaction ? Does a minor compaction require less working space than major compaction ? Yes, unless that minor compaction happens to involve all SStables due to compaction thresholds, at which time it

Re: 2x storage

2011-02-25 Thread Tyler Hobbs
On Fri, Feb 25, 2011 at 12:14 PM, A J s5a...@gmail.com wrote: OK. Is it also driven by type of compaction ? Does a minor compaction require less working space than major compaction ? No, every so often a minor compaction ends up compacting all SSTables, so it's effectively the same as a major

Re: 2x storage

2011-02-25 Thread A J
Thanks. What happens when my compaction fails for space reasons ? Is no compaction possible till I add more space ? I would assume writes are not impacted though the latency of reads would increase, right ? Also though writes are not seek-intensive, compactions are seek-intensive, no ? On Fri,

Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ed Anuff
It's nice to see some testing in this regard, however, it's worth pointing out something that gets lost in CF index vs secondary index discussions. What you're really proving is that get_slice (across columns) is faster than get_indexed_slices (across keys). For up to a certain size (and it would

Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Mohit Anchlia
Does it mean that we should design data model such that row keys actually become columns (and create secondary index) so that the data retrieval is faster. I am soon setting up big test instances to test all this. On Fri, Feb 25, 2011 at 11:18 AM, Ed Anuff e...@anuff.com wrote: It's nice to see

Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ed Anuff
At the risk of recapitulating a conversation that seems to happen with some frequency on this list, the answer is going to boil down to depends on your data model, but using rows as indexes is one of the core usage patterns of Cassandra, whether to store the list of keys to rows in another column

Re: My responses to this mailing list interpreted as SPAM

2011-02-25 Thread Aaron Morton
If you search the list there is some discussion about this. Best advice is to send in plain text. https://issues.apache.org/jira/browse/INFRA-3356 Personally I prefer the emails to have the whole discussion. Aaron On 25/02/2011, at 4:55 AM, Anthony John chirayit...@gmail.com wrote: Do not

RE: memtable_flush_after_mins setting not working

2011-02-25 Thread Jeffrey Wang
I just noticed this thread. Does this mean that (assuming the same setup of an empty keyspace and CFs added later) if I have a CF that I write to for some time, but not enough to hit the flush limits, it will never get flushed until the server is restarted? I believe this is causing commit logs

Re: 2x storage

2011-02-25 Thread A J
Another related question: Can the minor compactions across nodes be staggered so that I can control how many nodes are compacting at any given point ? On Fri, Feb 25, 2011 at 2:01 PM, A J s5a...@gmail.com wrote: Thanks. What happens when my compaction fails for space reasons ? Is no compaction

CassandraForums.com

2011-02-25 Thread kh jo
Hi Guys, for all of those who prefer forums over mailing lists, I setup a forum for cassandra, please have a look http://www.cassandraforums.com/ thanks Jo

Re: memtable_flush_after_mins setting not working

2011-02-25 Thread Jonathan Ellis
Yes. On Fri, Feb 25, 2011 at 4:29 PM, Jeffrey Wang jw...@palantir.com wrote: I just noticed this thread. Does this mean that (assuming the same setup of an empty keyspace and CFs added later) if I have a CF that I write to for some time, but not enough to hit the flush limits, it will never

How does node failure detection work in Cassandra?

2011-02-25 Thread tijoriwala.ritesh
Hi, I would like to know internals of how does node failure detection work in Cassandra? And in absence of any network partition, do all nodes see the same view of live nodes? Is there a concept of Coordinator/Election? If yes, how is merge handled after network partition heals? thanks, Ritesh

Re: How does node failure detection work in Cassandra?

2011-02-25 Thread Brandon Williams
On Fri, Feb 25, 2011 at 5:32 PM, tijoriwala.ritesh tijoriwala.rit...@gmail.com wrote: Hi, I would like to know internals of how does node failure detection work in Cassandra? http://bit.ly/phi_accrual Is there a concept of Coordinator/Election? No. -Brandon

Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 2:41 PM, A J s5a...@gmail.com wrote: Can the minor compactions across nodes be staggered so that I can control how many nodes are compacting at any given point ? Not without some crazy scheme where you control the compaction thresholds dynamically via some external

Re: 2x storage

2011-02-25 Thread Terje Marthinussen
Cassandra never compacts more than one column family at the time? Regards, Terje On 26 Feb 2011, at 02:40, Robert Coli rc...@digg.com wrote: On Fri, Feb 25, 2011 at 9:22 AM, A J s5a...@gmail.com wrote: I read in some cassandra notes that each node should be allocated twice the storage

Re: 2x storage

2011-02-25 Thread Robert Coli
On Fri, Feb 25, 2011 at 4:55 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Cassandra never compacts more than one column family at the time? Nope, compaction is single threaded currently. https://issues.apache.org/jira/browse/CASSANDRA-2191