RE: Thrift Server Implementations

2014-03-05 Thread Christopher Wirt
mentations On 02/13/2014 01:37 PM, Christopher Wirt wrote: > Anyway, today I moved the old HsHa implementation and the new > TThreadSelectorServer into a 2.0.5 checkout, hooked them in, built, > did a bit of testing and I'm now running live. > > > > We found the TThr

Commit logs building up

2014-02-26 Thread Christopher Wirt
We're running 2.0.5, recently upgraded from 1.2.14. Sometimes we are seeing CommitLogs starting to build up. Is this a potential bug? Or a symptom of something else we can easily address? We have commitlog_sync: periodic commitlog_sync_period_in_ms:1 commitlog_segment_size_in_m

Thrift Server Implementations

2014-02-13 Thread Christopher Wirt
TL;DR: Has anyone ever tried using the new thrift 0.9 TThreadSelectorServer for their thrift server? I did today and have found it performs pretty well. Is this something people would like to see in the C* trunk? Background: Yesterday we upgraded from Cass 1.2.14 to Cass 2.0.5. We

WRITETIME question

2013-12-18 Thread Christopher Wirt
Is there any reason to use the WRITETIME function on non-counter columns? I'm using CQL statements via the thrift protocol and get a Timestamp returned with each column. I'm pretty sure select a, writetime(a) from b where u = 1 is unnecessary for me. Unless a is a counter. I guess my re

RE: Counters question - is there a better way to count

2013-12-05 Thread Christopher Wirt
On Thu, Dec 5, 2013 at 4:44 PM, Christopher Wirt wrote: I want to build a really simple column family which counts the occurrence of a single event X. Once we reach Y occurrences of X the counter resets to 0 The obvious way to do this is with a counter CF. CREATE

Counters question - is there a better way to count

2013-12-05 Thread Christopher Wirt
I want to build a really simple column family which counts the occurrence of a single event X. Once we reach Y occurrences of X the counter resets to 0 The obvious way to do this is with a counter CF. CREATE TABLE xcounter1 ( id uuid, someid int,

RE: How to configure linux service for Cassandra?

2013-11-12 Thread Christopher Wirt
Starting multiple Cassandra nodes on the same machine involves setting loop back aliases and some configuration fiddling. Lucky for you Sylvain Lebresne made this handy tool in python which does the job for you. https://github.com/pcmanus/ccm to run as a service you need a script like thi

RE: java.io.FileNotFoundException when setting up internode_compression

2013-11-11 Thread Christopher Wirt
I had this the other day when we were accidentally provisioned a centos5 machine (instead of 6). Think it relates to the version of glibc. Notice it wants the native binary .so not the .jar So maybe update to a newer version of glibc? Or possibly make sure the .so exists at /usr/tmp/snappy-1.0.

RE: How to determine which node(s) an insert would go to in C* 2.0 with vnodes?

2013-10-08 Thread Christopher Wirt
In CQL there is a token() function you can use to find the result of your partitioning schemes hash function for any value. e.g. select token(value) from column_family1 where partition_column = value; You then need to find out which nodes are responsible for that value using nodetool ring o

RE: Rollback question regarding system metadata change

2013-10-02 Thread Christopher Wirt
I went with deleting the extra rows created in schema_columns and I've now successfully bootstrapped three nodes back on 1.2.10. No sour side effects to report yet. Thanks for your help From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 02 October 2013 01:00 To: user@cassandra.apac

Rollback question regarding system metadata change

2013-10-01 Thread Christopher Wirt
Moving back to 1.2.10. What is the procedure roll back from 2.0.1? Changes in the system schema seem to make this quite difficult. We have: DC1 - 10 x 1.2.10 DC2 - 4 x 1.2.10 DC3 - 3 x 2.0.1 -> ran this for a couple days and have decided to roll back In my efforts I've now completel

RE: 2.0.1 counter replicate on write error

2013-09-30 Thread Christopher Wirt
om On 27/09/2013, at 10:50 PM, Christopher Wirt wrote: Hello, I've started to see a slightly worrying error appear in our logs occasionally. We're writing at 400qps per machine and I only see this appear every 5-10minutes. Seems to have started when I switched us to us

2.0.1 counter replicate on write error

2013-09-27 Thread Christopher Wirt
Hello, I've started to see a slightly worrying error appear in our logs occasionally. We're writing at 400qps per machine and I only see this appear every 5-10minutes. Seems to have started when I switched us to using the hsha thrift server this morning. We've been running 2.0.1 ran off the

RE: 1.2.10 -> 2.0.1 migration issue

2013-09-26 Thread Christopher Wirt
AM, Christopher Wirt wrote: Should also say. I have managed to move one node from 1.2.10 to 2.0.0. I'm seeing this error on the machine I tried to migrate earlier to 2.0.1 I'm confused... for the record : 1) you tried to upgrade from 1.2.10 to 2.0.1 2) the NEWS.txt snippet you

RE: 1.2.10 -> 2.0.1 migration issue

2013-09-25 Thread Christopher Wirt
Should also say. I have managed to move one node from 1.2.10 to 2.0.0. I'm seeing this error on the machine I tried to migrate earlier to 2.0.1 Thanks From: Christopher Wirt [mailto:chris.w...@struq.com] Sent: 25 September 2013 14:04 To: 'user@cassandra.apache.org' Subj

RE: 1.2.10 -> 2.0.1 migration issue

2013-09-25 Thread Christopher Wirt
r 2013 13:11 To: user@cassandra.apache.org Subject: Re: 1.2.10 -> 2.0.1 migration issue you are probably reading trunk NEWS.txt read the ticket for explanation of what the issue was (it is a proper bug) On Wed, Sep 25, 2013 at 12:59 PM, Christopher Wirt wrote: Hi Marcus, Thanks fo

RE: 1.2.10 -> 2.0.1 migration issue

2013-09-25 Thread Christopher Wirt
.org/jira/browse/CASSANDRA-6093 and will try to have a look today. On Wed, Sep 25, 2013 at 1:48 AM, Christopher Wirt wrote: Hi, Just had a go at upgrading a node to the latest stable c* 2 release and think I ran into some issues with manifest migration. On initial start up I hit this er

1.2.10 -> 2.0.1 migration issue

2013-09-24 Thread Christopher Wirt
Hi, Just had a go at upgrading a node to the latest stable c* 2 release and think I ran into some issues with manifest migration. On initial start up I hit this error as it starts to load the first of my CF. INFO [main] 2013-09-24 22:56:01,018 LegacyLeveledManifest.java (line 89) Migra

RE: cass 1.2.8 -> 1.2.9

2013-09-24 Thread Christopher Wirt
Yes. Sorry. It was me being a fool. I didn't update the rackdc.properties file on the new version From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 24 September 2013 01:52 To: user@cassandra.apache.org Subject: Re: cass 1.2.8 -> 1.2.9 On Wed, Sep 11, 2013 at 7:42 AM, Christop

TTL and gc_grace_Seconds

2013-09-18 Thread Christopher Wirt
I have a column family contains time series events, all columns have a 24 hour TTL and gc_grace_seconds is currently 20 days. There is a TimeUUID in part of the key. It takes 15 days to repair the entire ring. Consistency is not my main worry. Speed is. We currently write to this CF at LOCA

sstable compression

2013-09-12 Thread Christopher Wirt
I current use Snappy for my SSTable compression on Cassandra 1.2.8. I would like to switch to using LZ4 compression for my SStables. Would simply altering the table definition mean that all newly written tables are LZ4 and can live in harmony with the existing Snappy SStables? Then naturall

cass 1.2.8 -> 1.2.9

2013-09-11 Thread Christopher Wirt
Anyone had issues upgrading to 1.2.9? I tried upgrading one server in a three node DC. The server appeared to come online fine without any errors, handshaking, etc. looking at tpstats the machine was serving very few reads. Looking from the server side we were getting a lot of Unavailable

Cassandra 2 Upgrade

2013-09-11 Thread Christopher Wirt
Hello, I'm keen on moving to 2.0. The new thrift server implementation and other performance improvements are getting me excited. I'm currently running 1.2.8 in 3 DC's with 3-3-9 nodes 64GB RAM, 3x200GB SSDs, thrift, LCS, Snappy, Vnodes, Is anyone using 2.0 in production yet? Had any is

HsHa

2013-08-13 Thread Christopher Wirt
Hello, I was trying out the hsha thrift server implementation and found that I get a fair amount of these appearing in the server logs. ERROR [Selector-Thread-9] 2013-08-13 15:39:10,433 TNonblockingServer.java (line 468) Read an invalid frame size of 0. Are you using TFramedTransport on the

RE: lots of small nodes vs fewer big nodes

2013-08-08 Thread Christopher Wirt
I found using a JBOD SSD setup (one per data directory) to be faster than RAID. JBOD configuration will also allow a disk to fail and the remaining disks to continue serving reads if you set disk_failure_policy: best_effort. If you do go for a RAID controller watch out for any special read/writ

RE: Counters and replication

2013-08-06 Thread Christopher Wirt
ugust 2013 20:30 To: user@cassandra.apache.org Subject: Re: Counters and replication On 5 August 2013 20:04, Christopher Wirt wrote: Hello, Question about counters, replication and the ReplicateOnWriteStage I've recently turned on a new CF which uses a counter column. We have a three DC setu

Counters and replication

2013-08-05 Thread Christopher Wirt
Hello, Question about counters, replication and the ReplicateOnWriteStage I've recently turned on a new CF which uses a counter column. We have a three DC setup running Cassandra 1.2.4 with vNodes, hex core processors, 32Gb memory. DC 1 - 9 nodes with RF 3 DC 2 - 3 nodes with RF 2

RE: Reducing the number of vnodes

2013-08-05 Thread Christopher Wirt
, at 12:30, Christopher Wirt wrote: Hi, I'm thinking about reducing the number of vnodes per server. We have 3 DC setup - one with 9 nodes, two with 3 nodes each. Each node has 256 vnodes. We've found that repair operations are beginning to take too long. Is reducing the

Reducing the number of vnodes

2013-08-05 Thread Christopher Wirt
Hi, I'm thinking about reducing the number of vnodes per server. We have 3 DC setup - one with 9 nodes, two with 3 nodes each. Each node has 256 vnodes. We've found that repair operations are beginning to take too long. Is reducing the number of vnodes to 64/32 likely to help our si

RE: disappointed

2013-07-24 Thread Christopher Wirt
off till later? Yeah, I have run into problems dropping schemas before as well. I was careful this time to start with an empty db folder. Glad you were successful in your transition.:) Paul On Jul 24, 2013, at 4:12 AM, "Christopher Wirt" wrote: Hi Paul, Sorry to

RE: disappointed

2013-07-24 Thread Christopher Wirt
Hi Paul, Sorry to hear you're having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptor

listen_address and rpc_address address on different interface

2013-07-11 Thread Christopher Wirt
Hello, I was wondering if anyone has measured the performance improvements to having the listen address and client address bound to different interface? We a have 2gbit connection serving both at the moment and this doesn't come close to being saturated. But being very keen on fast reads at

JMX Latency stats

2013-07-10 Thread Christopher Wirt
I was wondering if anyone knows the difference between the JMX latency stats and could enlighten me. We've been looking the column family specific stats and see really lovely < 3ms 99th percentile stats for all our families. org.apache.cassandra.metrics:type=ColumnFamily,keyspace=mykeyspace,sc

RE: Multiple JBOD data directory

2013-06-05 Thread Christopher Wirt
newbie bust just had a thought regarding your question 'How will it handle requests for data which unavailable?', wouldn't the data be served in that case from other nodes where it has been replicated? Regards, Shahab On Wed, Jun 5, 2013 at 5:32 AM, Christopher Wirt wrote: He

Multiple JBOD data directory

2013-06-05 Thread Christopher Wirt
Hello, We're thinking about using multiple data directories each with its own disk and are currently testing this against a RAID0 config. I've seen that there is failure handling with multiple JBOD. e.g. We have two data directories mounted to separate drives /disk1 /disk2 One of

RE: High performance disk io

2013-05-24 Thread Christopher Wirt
fast disks the average, 95th, and 99th, percentile can get by very far apart. I am currently trying to really study the effect of the width of a row (being in multiple sstables) vs its 95th percentile read time. On Thu, May 23, 2013 at 10:43 AM, Christopher Wirt wrote: Hi Igor, I was tal

RE: High performance disk io

2013-05-23 Thread Christopher Wirt
stograms for CF on cassandra side? Thanks! On 05/22/2013 05:41 PM, Christopher Wirt wrote: Hi Igor, Yea same here, 15ms for 99th percentile is our max. Currently getting one or two ms for most CF. It goes up at peak times which is what we want to avoid. We're using Cass 1.2.4

RE: High performance disk io

2013-05-22 Thread Christopher Wirt
her - to 10ms, this depends on the data volume you read in each query. Tuning read performance involved cleaning up data model, tuning cassandra.yaml, switching from Hector to astyanax, tuning OS parameters. On 05/22/2013 04:40 PM, Christopher Wirt wrote: Hello, We're looking at deploying

RE: High performance disk io

2013-05-22 Thread Christopher Wirt
from Hector to astyanax, tuning OS parameters. On 05/22/2013 04:40 PM, Christopher Wirt wrote: Hello, We're looking at deploying a new ring where we want the best possible read performance. We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb Heap, 800Mb

High performance disk io

2013-05-22 Thread Christopher Wirt
Hello, We're looking at deploying a new ring where we want the best possible read performance. We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb SATA for OS and commitlog Three column fam

RE: Repair session failed

2013-05-03 Thread Christopher Wirt
Hi Aaron, We're running 1.2.4, so with vNodes We ran scrub but saw the issue again when repairing nodetool status - Datacenter: DC01 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.70.48.23