You can also look at the Amazon's kinesis if you don't want to handle the pain of maintaining kafka/flume infra.
Thanks Best Regards On Thu, Nov 20, 2014 at 3:32 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: > Thank you for your answer, I don't know if I typed the question > correctly. But your nswer helps me. > > I'm going to make the question again for knowing if you understood me. > > I have this topology: > > DataSource1, .... , DataSourceN --> Kafka --> SparkStreaming --> HDFS > Kafka --> > HDFS (raw data) > > DataSource1, .... , DataSourceN --> Flume --> SparkStreaming --> HDFS > Flume --> > HDFS (raw data) > > > All data are going to be processed and going to HDFS as raw and > processed data. I don't know if it makes sense to use Kafka in this > case if data are just going to HDFS. I guess that before this > FlumeSpark Sink has more sense to feed SparkStream with a real-time > flow of data.. It doesn't look too much sense to have SparkStreaming > and get the data from HDFS. > > 2014-11-19 22:55 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>: > > Thank you for your answer, I don't know if I typed the question > > correctly. But your nswer helps me. > > > > I'm going to make the question again for knowing if you understood me. > > > > I have this topology: > > > > DataSource1, .... , DataSourceN --> Kafka --> SparkStreaming --> HDFS > > > > DataSource1, .... , DataSourceN --> Flume --> SparkStreaming --> HDFS > > > > All data are going to be pro > > > > > > 2014-11-19 21:50 GMT+01:00 Hari Shreedharan <hshreedha...@cloudera.com>: > >> Btw, if you want to write to Spark Streaming from Flume -- there is a > sink > >> (it is a part of Spark, not Flume). See Approach 2 here: > >> http://spark.apache.org/docs/latest/streaming-flume-integration.html > >> > >> > >> > >> On Wed, Nov 19, 2014 at 12:41 PM, Hari Shreedharan > >> <hshreedha...@cloudera.com> wrote: > >>> > >>> As of now, you can feed Spark Streaming from both kafka and flume. > >>> Currently though there is no API to write data back to either of the > two > >>> directly. > >>> > >>> I sent a PR which should eventually add something like this: > >>> > https://github.com/harishreedharan/spark/blob/Kafka-output/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala > >>> that would allow Spark Streaming to write back to Kafka. This will > likely be > >>> reviewed and committed after 1.2. > >>> > >>> I would consider writing something similar to push data to Flume as > well, > >>> if there is a sufficient use-case for it. I have seen people talk about > >>> writing back to kafka quite a bit - hence the above patch. > >>> > >>> Which one is better is upto your use-case and existing infrastructure > and > >>> preference. Both would work as is, but writing back to Flume would > usually > >>> be if you want to write to HDFS/HBase/Solr etc -- which you could > write back > >>> directly from Spark Streaming (of course, there are benefits of > writing back > >>> using Flume like the additional buffering etc Flume gives), but it is > still > >>> possible to do so from Spark Streaming itself. > >>> > >>> But for Kafka, the usual use-case is a variety of custom applications > >>> reading the same data -- for which it makes a whole lot of sense to > write > >>> back to Kafka. An example is to sanitize incoming data in Spark > Streaming > >>> (from Flume or Kafka or something else) and make it available for a > variety > >>> of apps via Kafka. > >>> > >>> Hope this helps! > >>> > >>> Hari > >>> > >>> > >>> On Wed, Nov 19, 2014 at 8:10 AM, Guillermo Ortiz <konstt2...@gmail.com > > > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> I'm starting with Spark and I just trying to understand if I want to > >>>> use Spark Streaming, should I use to feed it Flume or Kafka? I think > >>>> there's not a official Sink for Flume to Spark Streaming and it seems > >>>> that Kafka it fits better since gives you readibility. > >>>> > >>>> Could someone give a good scenario for each alternative? When would it > >>>> make sense to use Kafka and when Flume for Spark Streaming? > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>>> For additional commands, e-mail: user-h...@spark.apache.org > >>>> > >>> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >