Re: Spark Streaming with Flume or Kafka?

Guillermo Ortiz Wed, 19 Nov 2014 23:35:02 -0800

Thank you, but I'm just considering a free options.


2014-11-20 7:53 GMT+01:00 Akhil Das <ak...@sigmoidanalytics.com>:
> You can also look at the Amazon's kinesis if you don't want to handle the
> pain of maintaining kafka/flume infra.
>
> Thanks
> Best Regards
>
> On Thu, Nov 20, 2014 at 3:32 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
>>
>> Thank you for your answer, I don't know if I typed the question
>> correctly. But your nswer helps me.
>>
>> I'm going to make the question again for knowing if you understood me.
>>
>> I have this topology:
>>
>> DataSource1, .... , DataSourceN --> Kafka --> SparkStreaming --> HDFS
>>                                                           Kafka -->
>> HDFS (raw data)
>>
>> DataSource1, .... , DataSourceN --> Flume --> SparkStreaming --> HDFS
>>                                                           Flume -->
>> HDFS (raw data)
>>
>>
>> All data are going to be processed and going to HDFS as raw and
>> processed data. I don't know if it makes sense to use Kafka in this
>> case if data are just going to HDFS. I guess that before this
>> FlumeSpark Sink has more sense to feed SparkStream with a real-time
>> flow of data.. It doesn't look too much sense to have SparkStreaming
>> and get the data from HDFS.
>>
>> 2014-11-19 22:55 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:
>> > Thank you for your answer, I don't know if I typed the question
>> > correctly. But your nswer helps me.
>> >
>> > I'm going to make the question again for knowing if you understood me.
>> >
>> > I have this topology:
>> >
>> > DataSource1, .... , DataSourceN --> Kafka --> SparkStreaming --> HDFS
>> >
>> > DataSource1, .... , DataSourceN --> Flume --> SparkStreaming --> HDFS
>> >
>> > All data are going to be pro
>> >
>> >
>> > 2014-11-19 21:50 GMT+01:00 Hari Shreedharan <hshreedha...@cloudera.com>:
>> >> Btw, if you want to write to Spark Streaming from Flume -- there is a
>> >> sink
>> >> (it is a part of Spark, not Flume). See Approach 2 here:
>> >> http://spark.apache.org/docs/latest/streaming-flume-integration.html
>> >>
>> >>
>> >>
>> >> On Wed, Nov 19, 2014 at 12:41 PM, Hari Shreedharan
>> >> <hshreedha...@cloudera.com> wrote:
>> >>>
>> >>> As of now, you can feed Spark Streaming from both kafka and flume.
>> >>> Currently though there is no API to write data back to either of the
>> >>> two
>> >>> directly.
>> >>>
>> >>> I sent a PR which should eventually add something like this:
>> >>>
>> >>> https://github.com/harishreedharan/spark/blob/Kafka-output/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala
>> >>> that would allow Spark Streaming to write back to Kafka. This will
>> >>> likely be
>> >>> reviewed and committed after 1.2.
>> >>>
>> >>> I would consider writing something similar to push data to Flume as
>> >>> well,
>> >>> if there is a sufficient use-case for it. I have seen people talk
>> >>> about
>> >>> writing back to kafka quite a bit - hence the above patch.
>> >>>
>> >>> Which one is better is upto your use-case and existing infrastructure
>> >>> and
>> >>> preference. Both would work as is, but writing back to Flume would
>> >>> usually
>> >>> be if you want to write to HDFS/HBase/Solr etc -- which you could
>> >>> write back
>> >>> directly from Spark Streaming (of course, there are benefits of
>> >>> writing back
>> >>> using Flume like the additional buffering etc Flume gives), but it is
>> >>> still
>> >>> possible to do so from Spark Streaming itself.
>> >>>
>> >>> But for Kafka, the usual use-case is a variety of custom applications
>> >>> reading the same data -- for which it makes a whole lot of sense to
>> >>> write
>> >>> back to Kafka. An example is to sanitize incoming data in Spark
>> >>> Streaming
>> >>> (from Flume or Kafka or something else) and make it available for a
>> >>> variety
>> >>> of apps via Kafka.
>> >>>
>> >>> Hope this helps!
>> >>>
>> >>> Hari
>> >>>
>> >>>
>> >>> On Wed, Nov 19, 2014 at 8:10 AM, Guillermo Ortiz
>> >>> <konstt2...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I'm starting with Spark and I just trying to understand if I want to
>> >>>> use Spark Streaming, should I use to feed it Flume or Kafka? I think
>> >>>> there's not a official Sink for Flume to Spark Streaming and it seems
>> >>>> that Kafka it fits better since gives you readibility.
>> >>>>
>> >>>> Could someone give a good scenario for each alternative? When would
>> >>>> it
>> >>>> make sense to use Kafka and when Flume for Spark Streaming?
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >>>> For additional commands, e-mail: user-h...@spark.apache.org
>> >>>>
>> >>>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming with Flume or Kafka?

Reply via email to