Re: Deletes, null values

2013-04-29 Thread aaron morton
> I thought that C* had no null values... I use a lot of CF in which only the > columns name are filled up and I request a range of column to see which > references (like 1228#16866) exists. So I would like those column to simply > disappear from the table. Cassandra does not store null values.

Kundera 2.5 released

2013-04-29 Thread Vivek Mishra
Hi All, We are happy to announce the release of Kundera 2.5. Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL datastores. The idea behind Kundera is to make working with NoSQL databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB, Redis,

Re: Adding nodes in 1.2 with vnodes requires huge disks

2013-04-29 Thread John Watson
Opened a ticket: https://issues.apache.org/jira/browse/CASSANDRA-5525 On Mon, Apr 29, 2013 at 2:24 AM, aaron morton wrote: > is this understanding correct "we had a 12 node cluster with 256 vnodes on > each node (upgraded from 1.1), we added two additional nodes that streamed > so much data (60

Re: How to use Write Consistency 'ANY' with SSTABLELOADER - DSE Cassandra 1.1.9

2013-04-29 Thread Robert Coli
On Mon, Apr 29, 2013 at 1:17 PM, aaron morton wrote: > Bulk Loader does not use CL, it's more like a repair / bootstrap. > If you have to skip a node then use repair. The bulk loader ("sstableloader") can ignore replica nodes via -i option : ./src/java/org/apache/cassandra/tools/BulkLoader.java

Re: setcompactionthroughput and setstreamthroughput have no effect

2013-04-29 Thread Robert Coli
On Mon, Apr 29, 2013 at 3:52 PM, John Watson wrote: > Same behavior on 1.1.3, 1.1.5 and 1.1.9. > Currently: 1.2.3 (below snippets are from trunk) ./src/java/org/apache/cassandra/tools/NodeCmd.java " case SETCOMPACTIONTHROUGHPUT : if (arguments.length != 1) { badUs

Re: setcompactionthroughput and setstreamthroughput have no effect

2013-04-29 Thread John Watson
Same behavior on 1.1.3, 1.1.5 and 1.1.9. Currently: 1.2.3 On Mon, Apr 29, 2013 at 11:43 AM, Robert Coli wrote: > On Sun, Apr 28, 2013 at 2:28 PM, John Watson wrote: > > Running these 2 commands are noop IO wise: > > nodetool setcompactionthroughput 0 > > nodetool setstreamtrhoughput 0 > >

Re: normal thread counts?

2013-04-29 Thread aaron morton
> I used JMX to check current number of threads in a production cassandra > machine, and it was ~27,000. That does not sound too good. My first guess would be lots of client connections. What client are you using, does it do connection pooling ? See the comments in cassandra.yaml around rpc_se

Re: Cass 1.1.1 and 1.1.11 Exception during compactions

2013-04-29 Thread aaron morton
nodetool scrub will repair out of order rows in the source SSTables for the compaction process. Or you can stop the node and use the offline bin/sstablescrub tool Not sure how they got there, there was a ticket for similar problems in 1.1.1 Cheers - Aaron Morton Freelance Cas

Re: How to use Write Consistency 'ANY' with SSTABLELOADER - DSE Cassandra 1.1.9

2013-04-29 Thread aaron morton
Is this a once off data load or something you need to do regularly? One option you have with RF3 and 3 Nodes is to place a copy of all the SSTables on each node and use nodetool refresh to directly load the sstables into the node without any streaming. > 1. Please can anyone suggest how we can

Re: Inter-DC communication optimization

2013-04-29 Thread aaron morton
Messages are sent to all replicas involved in there request at the same time. All nodes in the cluster must be able to communicate to all other nodes. The coordinator the client is talking to, the local coordinator, groups messages (for one read/mutation) to be sent to remove data centres and on

Re: Understanding the source code

2013-04-29 Thread aaron morton
I did a talk on the internals at Apache Con this year, it goes through the architecture and the code http://www.slideshare.net/aaronmorton/apachecon-nafeb2013 Not sure if/when the videos are going to be put up. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand

Re: Many creation/inserts in parallel

2013-04-29 Thread aaron morton
> About 80% of these CFs should be truncated every day and if we decrease many > CF by creating one key field in one CF, a huge amount of tombstones will > appear. > > Truncation requires that all nodes be available, so if you are doing it each day you may run into troubles if a node it down.

Re: setcompactionthroughput and setstreamthroughput have no effect

2013-04-29 Thread Robert Coli
On Sun, Apr 28, 2013 at 2:28 PM, John Watson wrote: > Running these 2 commands are noop IO wise: > nodetool setcompactionthroughput 0 > nodetool setstreamtrhoughput 0 What version of cassandra? =Rob

Re: Many creation/inserts in parallel

2013-04-29 Thread Robert Coli
On Mon, Apr 29, 2013 at 12:33 AM, Sasha Yanushkevich wrote: > 1) We’ve tested 100 threads in parallel and each thread created 10 tables. With this pattern of creating CFs, you are begging for schema desynch. If this actually works any meaningful percentage of the time in modern cassandra, I would

Re: Adding nodes in 1.2 with vnodes requires huge disks

2013-04-29 Thread John Watson
They were all restarted a couple times after adding 'num_tokens: 256' to cassandra.yaml. Yes and nodetool ring became 'unusable' due to all the new tokens. On Mon, Apr 29, 2013 at 10:24 AM, Sam Overton wrote: > Did you update num_tokens on the existing hosts and restart them, before > you trie

Compaction, Slow Ring, and bad behavior

2013-04-29 Thread Drew from Zhrodague
Hi, we have a 9-node ring on m1.xlarge AWS hosts. We started having some trouble a while ago, and it's making me pull out all of my hair. The host in position #3 has been replaced 4 times. Each time, the host joins the ring, I do a nodetool repair -pr, and she seems fine for about a day. The

Re: Adding nodes in 1.2 with vnodes requires huge disks

2013-04-29 Thread Sam Overton
Did you update num_tokens on the existing hosts and restart them, before you tried bootstrapping in the new node? If the new node tried to stream all the data in the cluster then this would be consistent with you having missed that step. You should see "Calculating new tokens" in the logs of the e

Re: cassandra-shuffle time to completion and required disk space

2013-04-29 Thread John Watson
That's what we tried first before the shuffle. And ran into the space issue. That's detailed in another thread title: "Adding nodes in 1.2 with vnodes requires huge disks" On Mon, Apr 29, 2013 at 4:08 AM, Sam Overton wrote: > An alternative to running shuffle is to do a rolling > bootstrap/dec

Fwd: error casandra ring an hadoop connection ¿?

2013-04-29 Thread Miguel Angel Martin junquera
*hi all:* * * *i can run pig with cassandra and hadoop in EC2.* * * *I ,m trying to run pig with cassandra ring and hadoop * *The ring cassandra have the tasktrackers and datanodes , too. * * * *and i running pig from another machine where i have intalled the namenode-jobtracker.* *ihave a

RE: Exception when setting tokens for the cassandra nodes

2013-04-29 Thread moshe.kranc
For starters: If you are using the Murmur3 partitioner, which is the default in cassandra.yaml, then you need to calculate the tokens using: python -c 'print [str(((2**64 / 2) * i) - 2**63) for i in range(2)]' which gives the following values: ['-9223372036854775808', '0'] From: Rahul [mailto:r

Exception when setting tokens for the cassandra nodes

2013-04-29 Thread Rahul
Hi, I am testing out Cassandra 1.2 on two of my local servers. But I face problems with assigning tokens to my nodes. When I use nodetool to set token, I end up getting an java Exception. My test setup is as follows, Node1: (seed) Node2: (seed) Since I have two nodes, i calculated the tokens as

Re: Deletes, null values

2013-04-29 Thread Alain RODRIGUEZ
I created it almost a year ago with cassandra-cli. Now show_schema returns: create column family myCF with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_

normal thread counts?

2013-04-29 Thread William Oberman
Hi, I'm having some issues. I keep getting: ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: unable to create new native thread -- after a day or two of runti

Cass 1.1.1 and 1.1.11 Exception during compactions

2013-04-29 Thread Oleg Dulin
We saw this exception with 1.1.1 and also with 1.1.11 (we upgraded for unrelated reasons, to fix the FD leak during slice queries) -- name of the CF replaced with "*" for confidentiality: 10419 ERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 AbstractCassandraDaemon.java (line 132) Except

How to use Write Consistency 'ANY' with SSTABLELOADER - DSE Cassandra 1.1.9

2013-04-29 Thread praveen.akunuru
Hi All, We have a requirement to load approximately 10 million records, each record with approximately 100 columns. We are planning to use the Bulk-loader program to convert the data into SSTables and then load them using SSTABLELOADER. Everything is working fine when all nodes are up and runni

Fwd: Inter-DC communication optimization

2013-04-29 Thread Sergey Naumov
Hello. I would like to know whether updates are propagated from local DC to remote DCs simultaneously (so All-to-All network connections are preferable) or Cassandra can somehow determine nearest DCs and send updates only to them (so these nearest DCs have to propagate updates further)? Is there s

Re: cassandra-shuffle time to completion and required disk space

2013-04-29 Thread Sam Overton
An alternative to running shuffle is to do a rolling bootstrap/decommission. You would set num_tokens on the existing hosts (and restart them) so that they split their ranges, then bootstrap in N new hosts, then decommission the old ones. On 28 April 2013 22:21, John Watson wrote: > The amount

Re: CQL Clarification

2013-04-29 Thread aaron morton
Not really, I've passed on the comments to the doc teams. The column timestamp is just a 64 bit int like I said. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 29/04/2013, at 10:06 AM, Michael Theroux wrote: > Y

Re: Adding nodes in 1.2 with vnodes requires huge disks

2013-04-29 Thread aaron morton
is this understanding correct "we had a 12 node cluster with 256 vnodes on each node (upgraded from 1.1), we added two additional nodes that streamed so much data (600+Gb when other nodes had 150-200GB) during the joining phase that they filled their local disks and had to be killed" ? Can you

Understanding the source code

2013-04-29 Thread Mahmood Naderan
Dear all, I am trying to understand and analyze the source code of Cassandra. What I expect (and see in other codes) is that there should be three sections in a code. 1) Initialization and input reading, 2) Core computation and 3) Finalizing and gathering the output. However I can not find such

Re: Many creation/inserts in parallel

2013-04-29 Thread Sasha Yanushkevich
1) We’ve tested 100 threads in parallel and each thread created 10 tables. I think we will change our data model, but another problem may occur. About 80% of these CFs should be truncated every day and if we decrease many CF by creating one key field in one CF, a huge amount of tombstones will appe