Re: Is this the correct data model thinking?

2012-02-28 Thread aaron morton
> A.) store ALL the data associated with the user onto a single users row-key. > Some user keys may be small, others may get larger over time depending upon > activity. I would go with this. The important thing is supporting the read queries. Cheers Aaron - Aaron Morton Freelan

Re: sstable image/pic ?

2012-02-28 Thread aaron morton
On disk layout is described here, not sure how correct it is now days. http://wiki.apache.org/cassandra/ArchitectureSSTable There are multiple files involved, this will give you an idea of the read and write path http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ Hope that helps.

Re: TimeUUID

2012-02-28 Thread R. Verlangen
For querying purposes it would be better to use readable strings because you can really get information out of that. TimeUUID is just a unique value based on time; but not only the time. 2012/2/28 Tamar Fraenkel > Hi! > I have a column family where I use rows as "time buckets". > What I do is t

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-28 Thread aaron morton
Have you tried lowering the batch size and increasing the time out? Even just to get it to work. If you get a TimedOutException it means CL number of servers did not respond in time. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/0

Re: TimeUUID

2012-02-28 Thread aaron morton
not a great deal of difference, personally I would stick with seconds since epoch (it is probably slightly faster). Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/02/2012, at 7:24 PM, Tamar Fraenkel wrote: > Hi! > I have a column fami

Re: sstable image/pic ?

2012-02-28 Thread Franc Carter
On Tue, Feb 28, 2012 at 7:18 PM, aaron morton wrote: > On disk layout is described here, not sure how correct it is now days. > http://wiki.apache.org/cassandra/ArchitectureSSTable > > There are multiple files involved, this will give you an idea of the read > and write path > http://thelastpickl

Impact of old data on performance

2012-02-28 Thread Stefan Reek
Hi All, We are running a 3-node cluster with Cassandra 0.6.13. We are in the process of upgrading to 1.x, but can't do so for a while because we can't take the cluster offline. Until now 0.6.13 has run without problems, but lately we are getting some performance issues. We are getting timeouts

Re: sstable image/pic ?

2012-02-28 Thread Hontvári József Levente
* Does the column name get stored for every col/val for every key (which sort of worries me for long column names) Yes, the column name is stored with each value for every key, but it may not matter if you switch on compression, which AFAIK has only advantages and will be the default. I a

Re: sstable image/pic ?

2012-02-28 Thread Franc Carter
2012/2/28 Hontvári József Levente > > >> * Does the column name get stored for every col/val for every key (which >> sort of worries me for long column names) >> > > Yes, the column name is stored with each value for every key, but it may > not matter if you switch on compression, which AFAIK has

Re: TimeUUID

2012-02-28 Thread Tamar Fraenkel
Thanks, makes my life easier. Tamar On Tue, Feb 28, 2012 at 10:23 AM, aaron morton wrote: > not a great deal of difference, personally I would stick with seconds > since epoch (it is probably slightly faster). > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton >

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-28 Thread Patrik Modesto
I'll alter these settings and will let you know. Regards, P. On Tue, Feb 28, 2012 at 09:23, aaron morton wrote: > Have you tried lowering the  batch size and increasing the time out? Even > just to get it to work. > > If you get a TimedOutException it means CL number of servers did not respond >

GCInspecto​r causing slow writes?

2012-02-28 Thread Neil Dolling
When writing to Cassandra (v 1.0.7) I'm seeing ocasional delays of up to 4 seconds. Below is from the system.log where we are seeing the delays, is this a result of GC and is it worth me tuning these settings in order to fix? If so, any suggestions? adjusting memtable_total_space_in_mb? *DEBUG [Sc

Failed to join ring (NAT)

2012-02-28 Thread Richard Evans
I have a small ring of two nodes running successfully on aws. In order to understand cassandra support for NAT I have tried to add another node outside aws on a machine behind NAT. When I try to join the ring, there is a 30s pause after starting the messaging service and then it fails, unable to

Re: Server crashed due to "OutOfMemoryError: Java heap space"

2012-02-28 Thread Vitalii Tymchyshyn
Hello. Any messages about GC earlier in the logs? Cassandra server monitors memory and starts complaining in advance if memory gets full. Any chance you've got a full key delete-only scenario for some column families? Cassandra has a bug not being able to flush such memtables. I've filled a bu

Re: TimeUUID

2012-02-28 Thread Paul Loy
In a multi server env, to avoid key collisions timeuuid may be the better choice. On Monday, February 27, 2012, Tamar Fraenkel wrote: > Hi! > > I have a column family where I use rows as "time buckets". > What I do is take epoc time in seconds, and round it to 1 hour (taking the > result of time_

Re: TimeUUID

2012-02-28 Thread Dave Brosius
Given that these rows are wanted to be time buckets, you would want collisions, in fact that would be the standard way of working, so IMO, the uuid just removes the ability to bucket data and would not be wanted. On 02/28/2012 10:30 AM, Paul Loy wrote:

CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Philip Shon
I have not found any examples of utilizing a CompositeType of DynamicCompositeType as a row key. Is doing this frowned upon? All the examples I've seen have been using a CompositeType only for Column names (or values). My particular use case involves having the two components in the key being a D

Re: CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Chris Gerken
Phil, That's the problem with examples :) Row keys can be composite values. That works just fine. Was there something in particular you were trying to do? - Chris Chris Gerken chrisger...@mindspring.com 512.587.5261 http://www.linkedin.com/in/chgerken On Feb 28, 2012, at 10:25 AM, Philip

Re: CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Philip Shon
My row keys will actually be a date range. Where the first value is the starting date and the second value will be the length of that date range. Each column inside each row will represent metadata related to a specific product on during that date range. My question would be do I get any advanta

Re: GCInspecto​r causing slow writes?

2012-02-28 Thread Jonathan Ellis
No, a gc pause of 100ms isn't going to cause 4s delays. My first guess would be backed-up requests (visible in nodetool tpstats) due to too much random i/o (visible in iostat). On Tue, Feb 28, 2012 at 5:44 AM, Neil Dolling wrote: > When writing to Cassandra (v 1.0.7) I'm seeing ocasional delays

Re: CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Dave Brosius
With Key's you get validation if you use composite types... Also, you get prettier output in cli, but the value is less than with composite columns. On 02/28/2012 11:59 AM, Philip Shon wrote: My row keys will actually be a date range. Where the first value is the starting date and the secon

Re: Impact of old data on performance

2012-02-28 Thread Dan Retzlaff
Hi Stefan. Can you share the output of nodetool cfstats? On Tue, Feb 28, 2012 at 1:50 AM, Stefan Reek wrote: > Hi All, > > We are running a 3-node cluster with Cassandra 0.6.13. > We are in the process of upgrading to 1.x, but can't do so for a while > because we can't take the cluster offline.

Re: Failed to join ring (NAT)

2012-02-28 Thread aaron morton
What is the broadcast address on the nodes inside aws ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/02/2012, at 1:41 AM, Richard Evans wrote: > I have a small ring of two nodes running successfully on aws. > > In order to understan

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
@Aaron: Are you suggesting 3 nodes (rather than 2) to allow quorum operations even at the temporary loss of 1 node from cluster's reach ? I understand this but I just another question popped up in my mind, probably since I'm not much experienced managing cassandra, so I'm unaware whether it may be

Re: Implications of length of column names

2012-02-28 Thread Maxim Potekhin
When I migrated data from our RDBMS, I hashed columns names to integers. This makes for some footwork, but the space gain is clearly there so it's worth it. I de-hash on read. Maxim On 2/10/2012 5:15 PM, Narendra Sharma wrote: It is good to have short column names. They save space all the way

Re: Frequency of Flushing in 1.0

2012-02-28 Thread Xaero S
Thank you Aaron and others. That helped and we were able to limit the commitlog disk usage. We will be doing some tests by changing the memtable_total_space_in_mb param and see how that goes. On Mon, Feb 27, 2012 at 12:51 PM, aaron morton wrote: > yes, reducing commitlog_total_space_in_mb will re

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Maki Watanabe
If you have 3 nodes of RF=3, you can continue the service on cassandra even if one of the node will fail ( by hardware or software failure ). One other benefit is you can shutdown one node for maintenance or patch up without service interruption. If you run your service with 2 node and RF=2, your

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Maki Watanabe
> If you run your service with 2 node and RF=2, your data will be replicated but > your service will not be redundant. ( You can't stop both of nodes ) If your service doesn't need strong consistency ( allow cassandra returns "old" data after write, and possible write lost ), you can use CL=ONE fo

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
Thanks, I think I don't need high consistency(as per my app requirements) so I might be fine with CL.ONE instead of quorum, so I think I'm probably going to be ok with a 2 node cluster initially.. Could you guys also recommend some minimum memory to start with ? Of course that would depend on my