Re: Grouping and storing unordered time series data stream to HDFS

2015-05-16 Thread Nisrina Luthfiyati
Hi Ayan and Helena, I've considered using Cassandra/HBase but ended up opting to save to worker hdfs because I want to take advantage of the data locality since the data will often be loaded to Spark for further processing. I was also under the impression that saving to filesystem (instead of db)

Re: Grouping and storing unordered time series data stream to HDFS

2015-05-16 Thread Helena Edelson
Consider using cassandra with spark streaming and timeseries, cassandra has been doing time series for years. Here’s some snippets with kafka streaming and writing/reading the data back: https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrwea

Re: Grouping and storing unordered time series data stream to HDFS

2015-05-15 Thread ayan guha
Hi Do you have a cut off time, like how "late" an event can be? Else, you may consider a different persistent storage like Cassandra/Hbase and delegate "update: part to them. On Fri, May 15, 2015 at 8:10 PM, Nisrina Luthfiyati < nisrina.luthfiy...@gmail.com> wrote: > > Hi all, > I have a stream

Grouping and storing unordered time series data stream to HDFS

2015-05-15 Thread Nisrina Luthfiyati
Hi all, I have a stream of data from Kafka that I want to process and store in hdfs using Spark Streaming. Each data has a date/time dimension and I want to write data within the same time dimension to the same hdfs directory. The data stream might be unordered (by time dimension). I'm wondering w