Re: problem with bootstrap

2011-03-14 Thread Patrik Modesto
On Fri, Mar 11, 2011 at 22:31, Aaron Morton aa...@thelastpickle.com wrote: The assertion is interesting. Can you reproduce it with logging at debug and post the results? Could you try to reproduce it with a clean cluster? It was on a clean cluster last time. Anyway I started clean cluster

secondary indexes on data imported by json2sstable

2011-03-14 Thread Terje Marthinussen
Hi, Should it be expected that secondary indexes are automatically regenerated when importing data using json2sstable? Or is there some manual procedure that needs to be done to generate them? Regards, Terje

Re: secondary indexes on data imported by json2sstable

2011-03-14 Thread Norman Maurer
I would expect they get created on the fly while importing. If not I think its a bug... Bye, Norman 2011/3/14 Terje Marthinussen tmarthinus...@gmail.com Hi, Should it be expected that secondary indexes are automatically regenerated when importing data using json2sstable? Or is there some

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-14 Thread Chris Burroughs
On 03/11/2011 03:46 PM, Jonathan Ellis wrote: Repairs is not yet WAN-optimized but is still cheap if your replicas are close to consistent since only merkle trees + inconsistent ranges are sent over the network. What is the ticket number for WAN optimized repair?

Conflict resolution in Cassandra

2011-03-14 Thread Milind Parikh
https://docs.google.com/document/d/13Yc2t4d07290TdiRmSTchuAk9sbp4BeqOpqeYhbcDFM/edit?hl=en There was an excellent session on vector clocks and synchronous writes in cassandra. Here are my gleanings out of it. /*** sent from my android...please pardon occasional typos as I

Re: secondary indexes on data imported by json2sstable

2011-03-14 Thread Jonathan Ellis
You'd need to drop and recreate the index (but see https://issues.apache.org/jira/browse/CASSANDRA-2320 when doing this). On Mon, Mar 14, 2011 at 6:07 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi, Should it be expected that secondary indexes are automatically regenerated when

Re: Double ColumnType and comparing

2011-03-14 Thread Jonathan Ellis
We'd be happy to commit a patch contributing a DoubleType. On Sun, Mar 13, 2011 at 7:36 PM, Paul Teasdale teasda...@gmail.com wrote: I am quite new to Cassandra and am trying to model a simple Column Family which uses Doubles as column names: Datalines: { // ColumnFamilly dataline-1:{ // row

Re: Double ColumnType and comparing

2011-03-14 Thread David Boxenhorn
I you do it, I'd recommend BigDecimal. It's an exact type, and usually what you want. On Mon, Mar 14, 2011 at 3:40 PM, Jonathan Ellis jbel...@gmail.com wrote: We'd be happy to commit a patch contributing a DoubleType. On Sun, Mar 13, 2011 at 7:36 PM, Paul Teasdale teasda...@gmail.com wrote:

Re: reducing disk usage advice

2011-03-14 Thread Sylvain Lebresne
On Sun, Mar 13, 2011 at 7:10 PM, Karl Hiramoto k...@hiramoto.org wrote: Hi, I'm looking for advice on reducing disk usage.   I've ran out of disk space two days in a row while running a  nightly scheduled nodetool repair nodetool compact  cronjob. I have 6 nodes RF=3  each with 300 GB

Map-Reduce on top of cassandra

2011-03-14 Thread Or Yanay
Hi All, I am trying to write some map-reduce tasks so I can find out stuff like - how many records have X status? I am using 0.7.0 and have 5 nodes with ~100G of data on each node. I have written the code based on the word_count example and the map-reduce is running successfully BUT is

Out of Memory every 2 weeks

2011-03-14 Thread Jean-Yves LEBLEU
Sorry to create a new thread about Out of Memory problem, but I checked all other threads and did not find the answer. We have a running cluster of 2 cassandra nodes replication factor = 2 on red hat 4.8 32 bits with 4 G of memory where we run periodicaly out of memory (every 2 weeks) and both

Re: Map-Reduce on top of cassandra

2011-03-14 Thread Jeremy Hanna
Can you go into the #cassandra channel and ask your question? See if jeromatron or driftx are around. That way there can be a back and forth about settings and things. http://webchat.freenode.net/?channels=#cassandra On Mar 14, 2011, at 10:06 AM, Or Yanay wrote: Hi All, I am trying to

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-14 Thread Jedd Rashbrooke
Jonathon, thank you for your answers here. To explain this bit ... On 11 March 2011 20:46, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke j...@visualdna.com wrote:  Copying a cluster between AWS DC's:  We have ~ 150-250GB per node, with a

Re: Out of Memory every 2 weeks

2011-03-14 Thread Robert Coli
On Mon, Mar 14, 2011 at 8:27 AM, Jean-Yves LEBLEU jleb...@gmail.com wrote: Sorry to create a new thread about Out of Memory problem, but I checked all other threads and did not find the answer. [...] The question is I don't really understand the configuration problem, if some body have any

Re: Out of Memory every 2 weeks

2011-03-14 Thread Jean-Yves LEBLEU
Thank you, I am going to try that.

Fron scribe to cassandra

2011-03-14 Thread salidu andrea
Hi all, On the wiki pages i read an example (wirtten in java) to store data from scribe. I tried to write the same code in php without success. there is someone here who tried to do the same? tks in advanced javasilk

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-14 Thread Robert Coli
On Mon, Mar 14, 2011 at 8:39 AM, Jedd Rashbrooke j...@visualdna.com wrote:  But more importantly for us it would mean we'd have just the  one major outage, rather than two (relocation and 0.6 - 0.7) Take zero major outages instead? :D a) Set up new cluster on new version. b) Fork application

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-14 Thread David Boxenhorn
How do you write to two versions of Cassandra from the same client? Two versions of Hector? On Mon, Mar 14, 2011 at 6:46 PM, Robert Coli rc...@digg.com wrote: On Mon, Mar 14, 2011 at 8:39 AM, Jedd Rashbrooke j...@visualdna.com wrote: But more importantly for us it would mean we'd have just

Re: Seed

2011-03-14 Thread mcasandra
Tyler Hobbs-2 wrote: Seeds: Never use a node's own address as a seed if you are bootstrapping it by setting autobootstrap to true! I came accross this on the wiki. Can someone please help me understand this with some example? -- View this message in context:

Linux HugePages and mmap

2011-03-14 Thread mcasandra
Currently, in cassandra.yaml disk_access_mode is set to auto but the recommendation seems to be to use 'mmap_index_only'. If we use HugePages then do we still need to worry about setting disk_access_mode to mmap? I am planning to enable HugePages and use -XX:+UseLargePages option in JVM. I had a

Re: reducing disk usage advice

2011-03-14 Thread Karl Hiramoto
On 03/14/11 15:33, Sylvain Lebresne wrote: CASSANDRA-1537 is probably also a partial but possibly sufficient solution. That's also probably easier than CASSANDRA-1610 and I'll try to give it a shot asap, that had been on my todo list way too long. Thanks, eager to see CASSANDRA-1610 someday.

Re: Strange behaivour

2011-03-14 Thread ruslan usifov
I detect that this was after change schema and it hung on waitpid syscall. What can i do with this?

Re: Linux HugePages and mmap

2011-03-14 Thread Jonathan Ellis
On Mon, Mar 14, 2011 at 1:59 PM, mcasandra mohitanch...@gmail.com wrote: Currently, in cassandra.yaml disk_access_mode is set to auto but the recommendation seems to be to use 'mmap_index_only'. Wrong. The recommendation is to leave it on auto. If we use HugePages then do we still need to

Re: Linux HugePages and mmap

2011-03-14 Thread mcasandra
Jonathan Ellis-3 wrote: Wrong. The recommendation is to leave it on auto. this is where I see mmap recommended for index. http://wiki.apache.org/cassandra/StorageConfiguration http://wiki.apache.org/cassandra/StorageConfiguration Jonathan Ellis-3 wrote: HugePages has nothing to do

Re: Increase flush writer queue

2011-03-14 Thread Brandon Williams
On Mon, Mar 14, 2011 at 1:03 PM, Daniel Doubleday daniel.double...@gmx.netwrote: I was thinking of setting the work queue in CFS.flushWriterPool to new LinkedBlockingQueueRunnable(3) // because 3 is my favorite number instead of new

calculating initial_token

2011-03-14 Thread Sasha Dolgy
Sorry for being a bit daft ... Wanted a bit of validation or rejection ... If I have a 6 node cluster, replication factor 2 (don't think this is applicable to the token decision) is the following sufficient and correct for determining the tokens: #!/bin/bash for nodes in {0..5}; do echo

Calculate memory used for keycache

2011-03-14 Thread ruslan usifov
Hello How is it possible calculate this value? I think that key size, if we use RandomPartitioner will 16 bytes so keycache will took 16*(num of keycache elements) bytes ??

Re: Write speed roughly 1/10 of expected.

2011-03-14 Thread Steven Liu
Re: Mr. Schuller, The test documents are very small (a few lines of text each). Test data model is standard CF with each document correponding to a row containing 9-12 columns. We are using a single client for sequential batch_insert (probably maps to batch mutate in phpcassa), so it is very

Re: Out of Memory every 2 weeks

2011-03-14 Thread Peter Schuller
I am going to try that. Also, you may want to augment your VM options with: -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimestamps That way there should hopefully be some corroborating evidence as to the nature of the heap growth over time. -- / Peter Schuller

Re: Strange behaivour

2011-03-14 Thread Peter Schuller
Can you try a 'strace -fp PID' when it's in the state of spinning with system CPU time? I'm wondering whether it's stuck in a single syscall or just spinning around one or a set of syscalls. I have very vague recollections of a discussion on the list a few months ago about triggering a kernel bug

Re: Calculate memory used for keycache

2011-03-14 Thread Peter Schuller
How is it possible calculate this value? I think that key size, if we use RandomPartitioner will 16 bytes so keycache will took 16*(num of keycache elements) bytes ?? The easiest way right now is probably empirical testing. The issue is that the memory use must include overhead associated with

Re: Calculate memory used for keycache

2011-03-14 Thread Narendra Sharma
Sometime back I looked at the code to find that out. Following is the result. There will be some additional overhead for internal DS for ConcurrentLinkedHashMap. Keycache size * (8 bytes for position i.e. value + X bytes for key + 16 bytes for token (RP) + 8 byte reference for DecoratedKey + 8

Re: calculating initial_token

2011-03-14 Thread Narendra Sharma
On the same page there is a section on Load Balance that talks about python script to compute tokens. I believe your question is more about assigning new tokens and not compute tokens. 1. nodetool loadbalance will result in recomputation of tokens. It will pick tokens based on the load and not

Re: calculating initial_token

2011-03-14 Thread Sasha Dolgy
ah, you know ... i have been reading it wrong. the output shows a nice fancy column called Owns but i've only ever seen the percentage ... the amount of data or load is even ... doh. thanks for the reply. cheers -sd On Mon, Mar 14, 2011 at 10:47 PM, Narendra Sharma narendra.sha...@gmail.com

Re: Linux HugePages and mmap

2011-03-14 Thread Jonathan Ellis
On Mon, Mar 14, 2011 at 3:01 PM, mcasandra mohitanch...@gmail.com wrote: Jonathan Ellis-3 wrote: Wrong.  The recommendation is to leave it on auto. this is where I see mmap recommended for index. http://wiki.apache.org/cassandra/StorageConfiguration FTFY. HugePages has nothing to do with

Re: nodetool loadbalance

2011-03-14 Thread Sasha Dolgy
With my six node cluster ... nodetool loadbalance should be run on one node or all six? I run it on one and the ownership percentage gets even more unbalanced. So... in the spirit of the evening, I run it on another node . as you see, the ownership % keeps increasing and the token numbers

Re: Linux HugePages and mmap

2011-03-14 Thread mcasandra
Thanks! I think it still is a good idea to enable HiugePages and use UseLargePageSize option in JVM. What do you think? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Linux-HugePages-and-mmap-tp6170193p6171008.html Sent from the

Re: nodetool loadbalance

2011-03-14 Thread Jonathan Ellis
You should read http://wiki.apache.org/cassandra/Operations before running loadbalance. On Mon, Mar 14, 2011 at 5:27 PM, Sasha Dolgy sdo...@gmail.com wrote: With my six node cluster ... nodetool loadbalance should be run on one node or all six?  I run it on one and the ownership percentage gets

Re: nodetool loadbalance

2011-03-14 Thread Sasha Dolgy
Using the tokens I generated earlier, i ran nodetool move new token on each node and things look much better for the Owns % ... Address Status State LoadOwnsToken 170141183460469231731687303715884105725 10.0.0.1 Up Normal 234.51 KB 0.00% 0 10.0.0.2Up

Re: nodetool loadbalance

2011-03-14 Thread Sasha Dolgy
Yes, a lot of what is on the wiki makes perfect sense when read the right way. suppose there arent enough pictures or before/after info online to help the knowledge flow. On Mar 14, 2011 11:52 PM, Jonathan Ellis jbel...@gmail.com wrote: You should read http://wiki.apache.org/cassandra/Operations

Re: Write speed roughly 1/10 of expected.

2011-03-14 Thread Tyler Hobbs
Re: Mr. Hobbs, Did you mean which has the benefit of THRIFT-638, while 0.7.a.2 does not (instead of 0.7.a.3)? 0.7.a.3 was the latest version of phpcassa we could find on github. We installed 0.7.a.3 with its C extension and didn't see an improvement. Is there a newer version with THRIFT-638

Re: calculating initial_token

2011-03-14 Thread Narendra Sharma
The %age (owns) is just the arc length in terms of %age of tokens a node owns out of the total token space. It doesn't reflect the actual data. The size (load) is the real current load. -Naren On Mon, Mar 14, 2011 at 2:59 PM, Sasha Dolgy sdo...@gmail.com wrote: ah, you know ... i have been

Re: problem with bootstrap

2011-03-14 Thread aaron morton
Thanks, will try to look into it. Aaron On 14 Mar 2011, at 20:43, Patrik Modesto wrote: On Fri, Mar 11, 2011 at 22:31, Aaron Morton aa...@thelastpickle.com wrote: The assertion is interesting. Can you reproduce it with logging at debug and post the results? Could you try to reproduce it

Re: Seed

2011-03-14 Thread aaron morton
What page is that from ? Aaron On 15 Mar 2011, at 06:20, mcasandra wrote: Tyler Hobbs-2 wrote: Seeds: Never use a node's own address as a seed if you are bootstrapping it by setting autobootstrap to true! I came accross this on the wiki. Can someone please help me understand this

Re: problems while TimeUUIDType-index-querying with two expressions

2011-03-14 Thread aaron morton
It's failing to when comparing two TimeUUID values because on of them is not properly formatted. In this case it's comparing a stored value with the value passed in the get_indexed_slice() query expression. I'm going to assume it's the value passed for the expression. When you create the

Re: calculating initial_token

2011-03-14 Thread aaron morton
Once the node has started once, it will not use the value for initial_token in cassandra.yaml. Use nodetool move to assign a new token to the node. nodetool loadbalance is generally a bad idea www.spidertracks.com Aaron On 15 Mar 2011, at 13:04, Narendra Sharma wrote: The %age (owns) is

Re: Calculate memory used for keycache

2011-03-14 Thread Robert Coli
On Mon, Mar 14, 2011 at 1:19 PM, Peter Schuller peter.schul...@infidyne.com wrote: How is it possible calculate this value? I think that key size, if we use RandomPartitioner will 16 bytes so keycache will took 16*(num of keycache elements) bytes ?? The easiest way right now is probably

Re: problems while TimeUUIDType-index-querying with two expressions

2011-03-14 Thread Jonathan Ellis
Sounds like we should send an InvalidRequestException then. On Mon, Mar 14, 2011 at 8:06 PM, aaron morton aa...@thelastpickle.com wrote: It's failing to when comparing two TimeUUID values because on of them is not properly formatted. In this case it's comparing a stored value with the value

running all unit tests

2011-03-14 Thread Jeffrey Wang
Hey all, We're applying some patches to our own branch of Cassandra, and we are wondering if there is a good way to run all the unit tests. Just having JUnit run all the test classes seems to result in a lot of errors that are hard to fix, so I'm hoping there's an easy way to do this. Thanks!