Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-12-05 Thread Alexandru Sicoe
l disk space (50% max for > sized tier compaction vs about 80% for leveled compaction) > > > > "Our usage pattern is write once, read once (export) and delete once! " > > > > In this case, I think that leveled compaction fits your needs. > > > > "Can anyone suggest which (if any) is better? Are there better > solutions?" > > > > Are your sstable compressed ? You have 2 types of built-in compression > and you may use them depending on the model of each of your CF. > > > > see: > http://www.datastax.com/docs/1.1/operations/tuning#configure-compression > > > > Alain > > > > 2012/11/22 Alexandru Sicoe > > We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk > per node for the data dir and separate disk for the commitlog, 12 cores, 24 > GB RAM (12GB to Cassandra heap). > > > >

Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-11-22 Thread Alexandru Sicoe
Hello everyone, We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per node for the data dir and separate disk for the commitlog, 12 cores, 24 GB RAM (12GB to Cassandra heap). We now have 1.1 TB worth of data per node (RF = 2). Our data input is between 20 to 30 GB per day, d

what's the most 1.1 stable version?

2012-10-05 Thread Alexandru Sicoe
Hello, We are planning to upgrade from version 1.0.7 to the 1.1 branch. Which is the stable version that people are using? I see the latest release is 1.1.5 but maybe it's not fully wise to use this. Is 1.1.4 the one to use? Cheers, Alex

Re: repair never finishing 1.0.7

2012-06-25 Thread Alexandru Sicoe
the local network, or the nodes broadcast their > internal IP, in which case the "outside" nodes are helpless in trying to > connect to a local net. On DC2 nodes/the node you issue the repair on, > check for any sockets being opened to the internal addresses of the nodes > in

repair never finishing 1.0.7

2012-06-25 Thread Alexandru Sicoe
Hello everyone, I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about 300GB/node in the DC2. The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199. I did a "nodetool repair" on a node in DC2 without any external load on the system. It took 5 hrs

Re: composite query performance depends on component ordering

2012-04-03 Thread Alexandru Sicoe
ethodology of the tests ? > > (Here is the methodology I used to time queries previously > http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) > > Cheers > > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpic

2 questions DataStax Enterprise

2012-04-03 Thread Alexandru Sicoe
Hi guys, I'm trying out DSE and looking for the best way to arrange the cluster. I have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6 outside the gateway that are supposed to take replicas from the other 3 and serve reads and analytics jobs. 1. Is it ok to run the 3 nodes

composite query performance depends on component ordering

2012-03-30 Thread Alexandru Sicoe
Sender: adsi...@gmail.com Subject: composite query performance depends on component ordering Message-Id: Recipient: adam.nicho...@hl.co.uk __ This email has been scanned by the Symantec Email Security.cloud service. For more inf

composite query performance depends on component ordering

2012-03-30 Thread Alexandru Sicoe
Hi guys, I am consistently seeing a 20% improvement in query retrieval times if I use the composite comparator "Timestamp:ID" instead of "ID:Timestamp" where Timestamp=Long and ID=~100 character strings. I am retrieving all columns (~1 million) from a single row. Why is this happening? Cheers, Al

Re: another DataStax OpsCenter question

2012-03-30 Thread Alexandru Sicoe
t on. In the meantime though OpsCenter > will need to be able to hit the listen_address for each node. > > On Thu, Mar 29, 2012 at 12:47 PM, Alexandru Sicoe > wrote: > > Hello, > > I am planning on testing OpsCenter to see how it can monitor a multi DC > > cluster. Th

another DataStax OpsCenter question

2012-03-29 Thread Alexandru Sicoe
Hello, I am planning on testing OpsCenter to see how it can monitor a multi DC cluster. There are 2 DCs each on a different side of a firewall. I've configured NAT on the firewall to allow the communication between all Cassandra nodes on ports 7000, 7199 and 9160. The cluster works fine. However w

Cassandra multi DC

2012-03-29 Thread Alexandru Sicoe
Hello everyone, How are people running multi DC Cassandra across remote locations? Are VPNs used? Or some dedicated application proxis? What is the norm here? Any advice is much appreciated, Alex

Re: single row key continues to grow, should I be concerned?

2012-03-26 Thread Alexandru Sicoe
single counter column family, where the column name is the row key and the > value is the counter.) A naive solution would require reading the directory > before every read and the counter before every write--caching could > probably help with that. So this approach would probably le

Re: single row key continues to grow, should I be concerned?

2012-03-22 Thread Alexandru Sicoe
#x27;t think it would make too much difference. > range slice used by map-reduce will find the first row in the batch and > then step through them. > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com &

Re: single row key continues to grow, should I be concerned?

2012-03-22 Thread Alexandru Sicoe
Hi guys, Based on what you are saying there seems to be a tradeoff that developers have to handle between: "keep your rows under a certain size" vs "keep data that's queried together, on disk together" How would you handle this tradeoff in my case: I monitor about

replication in a 3 data center setup

2012-03-19 Thread Alexandru Sicoe
Hi everyone, If you have 3 data centers (DC1,DC2 and DC3) with 3 nodes each and you have a keyspace where the strategy options are such that each DC gets 2 replicas. If you only write to the nodes in DC1 what is the path the replicas take? Assuming you've correctly interleaved the tokens of all th

Re: Datastax Enterprise mixed workload cluster configuration

2012-03-16 Thread Alexandru Sicoe
d DC3? Cheers, Alex On Thu, Mar 15, 2012 at 11:26 PM, Alexandru Sicoe wrote: > Sorry for that last message, I was confused because I thought I needed to > use the DseSimpleSnitch but of course I can use the PropertyFileSnitch and > that allows me to get the configuration with 3 data cen

Re: Datastax Enterprise mixed workload cluster configuration

2012-03-15 Thread Alexandru Sicoe
Sorry for that last message, I was confused because I thought I needed to use the DseSimpleSnitch but of course I can use the PropertyFileSnitch and that allows me to get the configuration with 3 data centers explained. Cheers, Alex On Thu, Mar 15, 2012 at 10:56 AM, Alexandru Sicoe wrote

Re: Datastax Enterprise mixed workload cluster configuration

2012-03-15 Thread Alexandru Sicoe
> for m/r jobs is ONE, which would work. > > As far as tokens go, interleaving all three DCs and evenly spacing the > tokens will work. For example, the ordering of your nodes might be [1, 4, > 7, 2, 5, 8, 3, 6, 9]. > > > On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe wrot

Datastax Enterprise mixed workload cluster configuration

2012-03-14 Thread Alexandru Sicoe
Hi everyone, I want to test out the Datastax Enterprise software to have a mixed workload setup with an analytics and a real time part. However I am not sure how to configure it to achieve what I want: I will have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on another(4,5,6). 1,2

Re: unidirectional communication/replication

2012-02-29 Thread Alexandru Sicoe
ce I will want to do a semi real time replication of just the latest data added this won't work because I will be copying over all the data in the CF. Cheers, A > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thela

unidirectional communication/replication

2012-02-24 Thread Alexandru Sicoe
Hello everyone, I'm battling with this contraint that I have: I need to regularly ship out timeseries data from a Cassandra cluster that sits within an enclosed network, outside of the network. I tried to select all the data within a certian time window, writing to a file, and then copying the fi

Re: Querying all keys in a column family

2012-02-24 Thread Alexandru Sicoe
Hi Aaron and Martin, Sorry about my previous reply, I thought you wanted to process only all the row keys in CF. I have a similar issue as Martin because I see myself being forced to hit more than a million rows with a query (I only get a few columns from every row). Aaron, we've talked about thi

Re: Querying all keys in a column family

2012-02-14 Thread Alexandru Sicoe
Hey Martin, Have you tried CQL query: "SELECT FIRST 0 * FROM cfName" ? Cheers, Alex On Mon, Feb 13, 2012 at 11:00 PM, Martin Arrowsmith < arrowsmith.mar...@gmail.com> wrote: > Hi Experts, > > My program is such that it queries all keys on Cassandra. I want to do > this as quick as possible, in o

Re: How does Cassandra decide when to do a minor compaction?

2012-01-07 Thread Alexandru Sicoe
Hi Maxim, Why do you need to know this? Cheers, Alex On Sat, Jan 7, 2012 at 10:03 AM, aaron morton wrote: > > http://www.datastax.com/docs/1.0/operations/tuning#tuning-compaction > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 7/

Re: emptying my cluster

2012-01-05 Thread Alexandru Sicoe
> Ok, thanks for these suggestions, I will have to investigate further. > Also considering talking to Data Stax about DSE. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 5/01/2012, at 1:

Re: emptying my cluster

2012-01-04 Thread Alexandru Sicoe
can backup and empty the node in DC2 before the TTLs expire in the other 2 nodes. Cheers, Alex > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 3/01/2012, at 11:41 PM, Alexandru Sicoe wrote: > > Hi, &

Re: emptying my cluster

2012-01-03 Thread Alexandru Sicoe
ant to get the data out for an off node backup or is it for > processing in another system ? > > You may get by using: > > * TTL to expire data via compaction > * snapshots for backups > > Cheers > > - > Aaron Morton > Freelance Developer > @aaron

emptying my cluster

2012-01-02 Thread Alexandru Sicoe
Hi everyone and Happy New Year! I need advice for organizing data flow outside of my 3 node Cassandra 0.8.6 cluster. I am configuring my keyspace to use the NetworkTopologyStrategy. I have 2 data centers each with a replication factor 1 (i.e. DC1:1; DC2:1) the configuration of the PropertyFileSnit

Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-25 Thread Alexandru Sicoe
Hi everyone, I am currently in the process of writing a hardware proposal for a Cassandra cluster for storing a lot of monitoring time series data. My workload is write intensive and my data set is extremely varied in types of variables and insertion rate for these variables (I will have to handle

Re: CQL select not working for CF defined programatically with Hector API

2011-10-05 Thread Alexandru Sicoe
Perfectly right. Sorry for not paying attention! Thanks Eric, Alex On Tue, Oct 4, 2011 at 4:19 AM, Eric Evans wrote: > On Mon, Oct 3, 2011 at 12:02 PM, Alexandru Sicoe > wrote: > > Hi, > > I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I > > def

CQL select not working for CF defined programatically with Hector API

2011-10-03 Thread Alexandru Sicoe
Hi, I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I define a CF with comparator LongType like this: BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName("XXX"); columnFamilyDef