nodetool cfstats and compression

2012-09-14 Thread Jim Ancona
Do the row size stats reported by 'nodetool cfstats' include the effect of compression? Thanks, Jim

minor compaction and delete expired column-tombstones

2012-09-14 Thread Rene Kochen
Hi all, Does minor compaction delete expired column-tombstones when the row is also present in another table which is not subject to the minor compaction? Example: Say there are 5 SStables: - Customers_0 (10 MB) - Customers_1 (10 MB) - Customers_2 (10 MB) - Customers_3 (10 MB) - Customers_4 (30

cassandra/hadoop BulkOutputFormat failures

2012-09-14 Thread Brian Jeltema
I'm trying to do a bulk load from a Cassandra/Hadoop job using the BulkOutputFormat class. It appears that the reducers are generating the SSTables, but is failing to load them into the cluster: 12/09/14 14:08:13 INFO mapred.JobClient: Task Id : attempt_201208201337_0184_r_04_0, Status : FA

Disk configuration in new cluster node

2012-09-14 Thread Casey Deccio
I'm building a new "cluster" (to replace the broken setup I've written about in previous posts) that will consist of only two nodes. I understand that I'll be sacrificing high availability of writes if one of the nodes goes down, and I'm okay with that. I'm more interested in maintaining high con

Re: Cassandra node going down

2012-09-14 Thread rohit reddy
Thanks for the inputs. The disk on the EC2 node failed. This led to the problem. Now i have created a new cassandra node and added it to the cluster. Do i need to do anything to delete the old node from the cluster, or will the cluster balance it self. Asking this since in Datastax ops center its

Re: Cassandra node going down

2012-09-14 Thread Tyler Hobbs
You will need to run nodetool removetoken with the old node's token to permanently remove it from the cluster. On Fri, Sep 14, 2012 at 3:06 PM, rohit reddy wrote: > Thanks for the inputs. > The disk on the EC2 node failed. This led to the problem. Now i have > created a new cassandra node and add

Differences in row iteration behavior

2012-09-14 Thread Todd Fast
Hi-- We are iterating rows in a column family two different ways and are seeing radically different row counts. We are using 1.0.8 and RandomPartitioner on a 3-node cluster. In the first case, we have a trivial Hadoop job that counts 29M rows using the standard MR pattern for counting (mappe

Re: Differences in row iteration behavior

2012-09-14 Thread Jeremy Hanna
Are there any deletions in your data? The Hadoop support doesn't filter out tombstones, though you may not be filtering them out in your code either. I've used the hadoop support for doing a lot of data validation in the past and as long as you're sure that the code is sound, I'm pretty confid

Re: cassandra/hadoop BulkOutputFormat failures

2012-09-14 Thread Jeremy Hanna
A couple of guesses: - are you mixing versions of Cassandra? Streaming differences between versions might throw this error. That is, are you bulk loading with one version of Cassandra into a cluster that's a different version? - (shot in the dark) is your cluster overwhelmed for some reason? I

Re: nodetool connection refused

2012-09-14 Thread Manu Zhang
should we update the wiki? On Fri, Sep 14, 2012 at 1:18 PM, aaron morton wrote: > Yes. > If your IDE is starting cassandra the settings from cassandra-env.sh will > not be used. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > >

Re: Schema consistently not propagating to a node.

2012-09-14 Thread aaron morton
Out of interest, how out of sync where they ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 6:53 AM, Ben Frank wrote: > Hi Sergey, >That was exactly it, thank you! > > -Ben > > On Thu, Sep 13, 2012 at 12:07 AM, Serge

Re: secondery indexes TTL - strange issues

2012-09-14 Thread aaron morton
> INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java > (line > 221) Compacted to > [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E > ventsByItem.ebi_eventtypeIndex-he-10-Data.db,]. 78,623,000 to 373,348 (~0% > of o > riginal) bytes for 83 keys at 0.000280MB/

Re: Composite Column Query Modeling

2012-09-14 Thread aaron morton
You _could_ use one wide row and do a multiget against the same row for different column slices. Would be less efficient than a single get against the row. But you could still do big contiguous column slices. You may get some benefit from the collections in CQL 3 http://www.datastax.com/dev/bl

Re: Data Model

2012-09-14 Thread aaron morton
> Consider a course_students col family which gives a list of students for a > course I would use two CF's: Course CF: * Each row is one course * Columns are the properties and values of the course CourseEnrolements CF * Each row is one course * Column name is th

Reading column names only

2012-09-14 Thread Robin Verlangen
Hi there, Would it be possible to read only the column names, instead of the names and values? I would like to store some in the value, but without the cost of slowing down the reads of the column names (primary task). Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverla

Re: Changing bloom filter false positive ratio

2012-09-14 Thread aaron morton
I have a hunch that the SSTable selection based on the Min and Max keys in ColumnFamilyStore.markReferenced() means that a higher false positive has less of an impact. it's just a hunch, i've not tested it. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.the

Re: hadoop inserts blow out heap

2012-09-14 Thread aaron morton
Hi Brian did you see my follow up questions here http://www.mail-archive.com/user@cassandra.apache.org/msg24840.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/09/2012, at 11:52 PM, Brian Jeltema wrote: > I'm a fairly novice Cass

Re: Reading column names only

2012-09-14 Thread aaron morton
It's not possible to read just the column names. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 9:05 PM, Robin Verlangen wrote: > Hi there, > > Would it be possible to read only the column names, instead of the names and

Re: Reading column names only

2012-09-14 Thread Robin Verlangen
Hi Aaron, Is this something that's worth becoming a feature in the future? Or should I rework my data model? If so, do you have any suggestions? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this mess

AW: secondery indexes TTL - strange issues

2012-09-14 Thread Roland Gude
I am not sure it is compacting an old file: the same thing happens eeverytime I rebuild the index. New Files appear, get compacted and vanish. We have set up a new smaller cluster with fresh data. Same thing happens here as well. Date gets inserted and accessible via index query for some time. A

Re: Many ParNew collections

2012-09-14 Thread Rene Kochen
Thanks Aaron, At another production site the exact same problems occur (also after ~6 months). Here I have a very small cluster of three nodes with replication factor = 3. One of the three nodes begins to have many long Parnews and high CPU load. I upgraded to Cassandra 1.0.11, but the GC problem

Query advice to prevent node overload

2012-09-14 Thread André Cruz
Hello. I have a schema that represents a filesystem and one example of a Super CF is: CF FilesPerDir: (DIRNAME -> (FILENAME -> (attribute1: value1, attribute2: value2)) And in cases of directory moves, I have to fetch all files of that directory and subdirectories. This implies one cassandra q

Re: Data Model

2012-09-14 Thread Hiller, Dean
playOrm uses EXACTLY that pattern where @OneToMany becomes student.rowkeyStudent1 student.rowkeyStudent2 and the other fields are fixed. It is a common pattern in noSQL. Dean From: aaron morton mailto:aa...@thelastpickle.com>> Reply-To: "user@cassandra.apache.org

Cassandra node going down

2012-09-14 Thread rohit reddy
Hi, I'm facing a problem in Cassandra cluster deployed on EC2 where the node is going down under write load. I have configured a cluster of 4 Large EC2 nodes with RF of 2. All nodes are instance storage backed. DISK is RAID0 with 800GB I'm pumping in write requests at about 4000 writes/sec. One

Re: Cassandra node going down

2012-09-14 Thread Robin Verlangen
Hi Robbit, I think it's running out of disk space, please verify that (on Linux: df -h ). Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attentio

Re: Cassandra node going down

2012-09-14 Thread Robin Verlangen
Robbit = Rohit of course, excuse me. Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidenti

Re: Cassandra node going down

2012-09-14 Thread rohit reddy
Hi Robin, I had checked that. Our disk size is about 800GB, and the total data size is not more than 40GB. Even if all the data is stored in one node, this won't happen. I'll try to see if the disk failed. Is this anything to do with VM memory?.. cause this logs suggests that.. Heap is 0.7515559

Is it possible to create a schema before a Cassandra node starts up ?

2012-09-14 Thread Xu, Zaili
Guys I am pretty new to Cassandra. I have a script that needs to set up a schema first before starting up the cassandra node. Is this possible ? Can I create the schema directly on cassandra storage and then when the node starts up it will pick up the schema ? Zaili From: rohit reddy [mailto:

Re: Cassandra node going down

2012-09-14 Thread Robin Verlangen
Cassandra writes to memtables, that will get flushed to disk when it's time. That might be because of running out of memory (the log message you just posted), on a shutdown, or at other times. That's why you're using memory while writing. You seem to be running on AWS, are you sure your data locat

cassandra does not close the stdout console on startup

2012-09-14 Thread Xu, Zaili
Hi, Another newby question. My script needs to start up cassandra node. However cassandra doesn't close the stdout console and therefore never returns cassandra -p c.pid Is there anyway to have cassandra close the stdout ? Zaili ** IMPORTANT

Re: Composite Column Query Modeling

2012-09-14 Thread Adam Holmberg
I think what you're describing might give me what I'm after, but I don't see how I can pass different column slices in a multiget call. I may be missing something, but it looks like you pass multiple keys but only a singular SlicePredicate. Please let me know if that's not what you meant. I'm awar

Re: Cassandra, AWS and EBS Optimized Instances/Provisioned IOPs

2012-09-14 Thread Chris Dodge
Michael Theroux yahoo.com> writes: > > Hello, > A number of weeks ago, Amazon announced the availability of EBS Optimized instances and Provisioned IOPs for Amazon EC2.  Historically, I've read EBS is not recommended for Cassandra due to the network contention that can quickly result (http://ww

Re: Composite Column Query Modeling

2012-09-14 Thread Hiller, Dean
There is another trick here. On the playOrm open source project, we need to do a sparse query for a join and so we send out 100 async requests and cache up the java "Future" objects and return the first needed result back without waiting for the others. With the S-SQLin playOrm, we have the IN

Astyanax - build

2012-09-14 Thread A J
Hi, I am new to java and trying to get the Astyanax client running for Cassandra. Downloaded astyanax from https://github.com/Netflix/astyanax. How do I compile the source code from here it in a very simple fashion from linux command line ? Thanks.

Re: Changing bloom filter false positive ratio

2012-09-14 Thread Peter Schuller
> I have a hunch that the SSTable selection based on the Min and Max keys in > ColumnFamilyStore.markReferenced() means that a higher false positive has > less of an impact. > > it's just a hunch, i've not tested it. For leveled compaction, yes. For non-leveled, I can't see how it would since each

Re: Astyanax - build

2012-09-14 Thread Philip O'Toole
On Fri, Sep 14, 2012 at 12:28:08PM -0400, A J wrote: > Hi, > I am new to java and trying to get the Astyanax client running for Cassandra. > > Downloaded astyanax from https://github.com/Netflix/astyanax. How do I > compile the source code from here it in a very simple fashion from > linux command

Re: Astyanax - build

2012-09-14 Thread Hiller, Dean
I didn't need to compile it. It is up in the maven repositories as we http://mvnrepository.com/artifact/com.netflix.astyanax/astyanax Or are you trying to see how it works? (We use the same client on playORM open source projectŠit works like a charm). Dean On 9/14/12 10:28 AM, "A J" wrote:

Re: Astyanax - build

2012-09-14 Thread Philip O'Toole
On Fri, Sep 14, 2012 at 10:49 AM, Hiller, Dean wrote: > I didn't need to compile it. It is up in the maven repositories as we > > http://mvnrepository.com/artifact/com.netflix.astyanax/astyanax Actually, yeah, that's what I ended up doing with my ghetto set up too, but I did compile my examples