Re: Digest Query Seems to be corrupt on certain cases

2013-03-31 Thread aaron morton
When I manually inspected this byte array, it seems hold all details correctly, except the super-column name, causing it to fetch the entire wide row. What is the CF definition and what is the exact query you are sending? There does not appear to be anything obvious in the QueryPath serde

Re: Problem with streaming data from Hadoop: DecoratedKey(-1, )

2013-03-31 Thread aaron morton
but yesterday one of 600 mappers failed :) From what I can understand by looking into the C* source, it seems to me that the problem is caused by a empty (or surprisingly finished?) input buffer (?) causing token to be set to -1 which is improper for RandomPartitioner: Yes, there is a

Re: Reading data in bulk from cassandra for indexing in Elastic search

2013-03-31 Thread aaron morton
Approach 1: 1. Get chucks of 10,000 keys (which is configurable, but when I increase it to more than 15,000, I get a thrift frame size error cassandra. To fix it, I will need to increase that frame size via cassandra.yml) and its columns (around 15 columns/key). You can model this on

Re: weird behavior with RAID 0 on EC2

2013-03-31 Thread aaron morton
Ok, if you're going to look into it, please keep me/us posted. It's not on my radar. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: Ok, if you're

Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

2013-03-31 Thread aaron morton
But what if the gc_grace was changed to a lower value as part of a schema migration after the hints have been marked with TTLs equal to the lower gc_grace before the migration? There would be a chance then if the tombstones had been purged. Want to raise a ticket ? Cheers

Re: CQL3 And Map Literals

2013-03-31 Thread aaron morton
I am curious. Was there a specific reason why it was decided to use single-quotes? ANSII SQL compatible. (Am offline now and cannot confirm, but years of writing SQL with single quotes makes me think of that. ) Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand

Re: Timeseries data

2013-03-31 Thread aaron morton
I think if you use Level compaction, the number of sstables you will touch will be less because sstables in each level is non overlapping except L0. You will want to do some testing because LCS uses extra IO to make those guarantees. You will also want to look at the SSTable size with LCS if

Re: Lost data after expanding cluster c* 1.2.3-1

2013-03-31 Thread aaron morton
First thought is the new nodes were marked as seeds. Next thought is check the logs for errors. You can always run a nodetool repair if you are concerned data is not where you think it should be. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton

Re: Cassandra/MapReduce ‘Data Locality’

2013-03-31 Thread aaron morton
ColumnFamilySplit((85070591730234615865843651857942052864, '127605887595351923798765477786913079296] @[d2t0053g]) Can you provide some more information on where these log lines are from and what you did to get them ? Cheers - Aaron Morton Freelance Cassandra Consultant New

Re: CQL queries timing out (and had worked)

2013-03-31 Thread aaron morton
So that mismatch can break rpc across the cluster, apparently. mmm, that ain't right. Anything in the logs? Can you reproduce this on a small cluster or using ccm https://github.com/pcmanus/ccm ? Can you raise a ticket ? Thanks - Aaron Morton Freelance Cassandra Consultant

Re: Insert v/s Update performance

2013-03-31 Thread aaron morton
How this parameter works? I have 3 nodes and 2 core each CPU and I have higher writes. It slows down the rate that compaction reads from disk. It reads at bit then has to take a break and wait until it can read again. With only 2 cores you will be running into issues when compaction or repair

Re: Cassandra/MapReduce ‘Data Locality’

2013-03-31 Thread Alicia Leong
I have 4 Cassandra nodes that also installed with Datanode TaskTracker. This log printed at the console, when I execute hadoop jar TokenRange (1) 127605887595351923798765477786913079296 = 0 TokenRange (2) 85070591730234615865843651857942052864 = 127605887595351923798765477786913079296

Re: CQL queries timing out (and had worked)

2013-03-31 Thread Edward Capriolo
Technically it should work a mix of hsha and the other option. I tried a mix/match as and I noticed some clients were not happy and some other odd stuff, but I could not tie it down to the setting because thrift from the cli was working for me. On Sun, Mar 31, 2013 at 6:30 AM, aaron morton

Re: Lost data after expanding cluster c* 1.2.3-1

2013-03-31 Thread Kais Ahmed
Hi aaron, Thanks for reply, i will try to explain what append exactly I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami ( https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with this config --clustername myDSCcluster --totalnodes 4--version community Two days

Re: weird behavior with RAID 0 on EC2

2013-03-31 Thread Alexis Lê-Quôc
Alain, Can you post your mdadm --detail /dev/md0 output here as well as your iostat -x -d when that happens. A bad ephemeral drive on EC2 is not unheard of. Alexis | @alq | http://datadog.com P.S. also, disk utilization is not a reliable metric, iostat's await and svctm are more useful imho.

Re: weird behavior with RAID 0 on EC2

2013-03-31 Thread Rudolf van der Leeden
I've seen the same behaviour (SLOW ephemeral disk) a few times. You can't do anything with a single slow disk except not using it. Our solution was always: Replace the m1.xlarge instance asap and everything is good. -Rudolf. On 31.03.2013, at 18:58, Alexis Lê-Quôc wrote: Alain, Can you

Re: Lost data after expanding cluster c* 1.2.3-1

2013-03-31 Thread aaron morton
Please do not rely on colour in your emails, the best way to get your emails accepted by the Apache mail servers is to use plain text. At this moment the errors started, we see that members and other data are gone, at this moment the nodetool status return (in red color the 3 new nodes) What

Re: MultiInput/MultiGet CF in MapReduce

2013-03-31 Thread aaron morton
If I would use client.get_slice ( key). My rowkey is '20130314' from Index Table. Q1) How to know for rowkey '20130314' is in which Token Range EndPoint. Calculate the MD5 hash of the key and find the token range that contains it. This is what is used internally