Hi Ayan and Helena,
I've considered using Cassandra/HBase but ended up opting to save to worker
hdfs because I want to take advantage of the data locality since the data
will often be loaded to Spark for further processing. I was also under the
impression that saving to filesystem (instead of db)
Consider using cassandra with spark streaming and timeseries, cassandra has
been doing time series for years.
Here’s some snippets with kafka streaming and writing/reading the data back:
https://github.com/killrweather/killrweather/blob/master/killrweather-app/src/main/scala/com/datastax/killrwea
Hi
Do you have a cut off time, like how "late" an event can be? Else, you may
consider a different persistent storage like Cassandra/Hbase and delegate
"update: part to them.
On Fri, May 15, 2015 at 8:10 PM, Nisrina Luthfiyati <
nisrina.luthfiy...@gmail.com> wrote:
>
> Hi all,
> I have a stream
Hi all,
I have a stream of data from Kafka that I want to process and store in hdfs
using Spark Streaming.
Each data has a date/time dimension and I want to write data within the
same time dimension to the same hdfs directory. The data stream might be
unordered (by time dimension).
I'm wondering w