TimedOutException in CqlRecordWriter

2013-10-08 Thread Renat Gilfanov
Hello, I run Hadoop jobs which read data from Cassandra 1.2.8 and write results back to another tables. One of my reduce tasks was killed 2 times by job tracker, because it wasn't responding for more than 10 minutes, the 3rd attempt was succesfull. The error message for killed reduce tasks

data schema for hourly runnning analytics

2013-09-26 Thread Renat Gilfanov
Hello, We have a column family which stores incoming requests, and we would like to perform some analytics  on that data using Hadoop. The analytic results should be available pretty soon, not realtime, but within an hour or so. So we store the current hour number (calculated from timestamp)

Cassandra input paging for Hadoop

2013-09-10 Thread Renat Gilfanov
Hi, We have Hadoop jobs that read data from our Cassandra column families and write some data back to another column families. The input column families are pretty simple CQL3 tables without wide rows. In Hadoop jobs we set up corresponding WHERE clause in

Recommended way of data migration

2013-09-07 Thread Renat Gilfanov
Hello, Let's say we have a simple CQL3 table CREATE TABLE example (     id UUID PRIMARY KEY,     timestamp TIMESTAMP,     data ASCII ); And I need to mutate  (for example encrypt) column values in the data column for all rows. What's the recommended approach to perform such migration

How to fix host ID collision?

2013-09-03 Thread Renat Gilfanov
Hello, We have Cassandra cluster with 5 nodes hosted in the Amazon EC2, and  I had to restart two of them, so their IPs changed. We use NetworkTopologyStrategy, so I simply updated IPs in the cassandra-topology.properties file. However, as I understood, old IPs remained somewhere in the

Re[2]: How to fix host ID collision?

2013-09-03 Thread Renat Gilfanov
seeds then? Вторник, 3 сентября 2013, 14:08 -07:00 от Robert Coli rc...@eventbrite.com: On Tue, Sep 3, 2013 at 2:01 PM, Renat Gilfanov gren...@mail.ru wrote: We have Cassandra cluster with 5 nodes hosted in the Amazon EC2, and  I had to restart two of them, so their IPs changed. We use

Cassandra cluster migration in Amazon EC2

2013-09-02 Thread Renat Gilfanov
Hello, Currently we have a Cassandra cluster in the Amazon EC2, and we are planning to upgrade our deployment configuration to achieve better performance and stability. However, a lot of open questions arise when planning this migration. I'll be very thankfull if somebody could answer my

Re[2]: Cassandra cluster migration in Amazon EC2

2013-09-02 Thread Renat Gilfanov
the old ones, you'll be able to do it without downtime. It'll also have the effect of randomizing the tokens, I believe. On Sep 2, 2013, at 4:21 PM, Renat Gilfanov gren...@mail.ru wrote: Hello, Currently we have a Cassandra cluster in the Amazon EC2, and we are planning to upgrade our

Recomended storage choice for Cassandra on Amazon m1.xlarge instance

2013-09-02 Thread Renat Gilfanov
Hello, I'd like to ask what is the best options of separating commit log and data on Amazon m1.xlarge instance, given 4x420 Gb attached storages and EBS volume ? As far as I understand, the EBS is not the choice and it's recomended to use attached storages instead. Is it better to combine 4