Btw, if you want to write to Spark Streaming from Flume -- there is a sink (it is a part of Spark, not Flume). See Approach 2 here: http://spark.apache.org/docs/latest/streaming-flume-integration.html
On Wed, Nov 19, 2014 at 12:41 PM, Hari Shreedharan < hshreedha...@cloudera.com> wrote: > As of now, you can feed Spark Streaming from both kafka and flume. > Currently though there is no API to write data back to either of the two > directly. > > I sent a PR which should eventually add something like this: > https://github.com/harishreedharan/spark/blob/Kafka-output/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala > that would allow Spark Streaming to write back to Kafka. This will likely > be reviewed and committed after 1.2. > > I would consider writing something similar to push data to Flume as well, > if there is a sufficient use-case for it. I have seen people talk about > writing back to kafka quite a bit - hence the above patch. > > Which one is better is upto your use-case and existing infrastructure and > preference. Both would work as is, but writing back to Flume would usually > be if you want to write to HDFS/HBase/Solr etc -- which you could write > back directly from Spark Streaming (of course, there are benefits of > writing back using Flume like the additional buffering etc Flume gives), > but it is still possible to do so from Spark Streaming itself. > > But for Kafka, the usual use-case is a variety of custom applications > reading the same data -- for which it makes a whole lot of sense to write > back to Kafka. An example is to sanitize incoming data in Spark Streaming > (from Flume or Kafka or something else) and make it available for a variety > of apps via Kafka. > > Hope this helps! > > Hari > > > On Wed, Nov 19, 2014 at 8:10 AM, Guillermo Ortiz <konstt2...@gmail.com> > wrote: > >> Hi, >> >> I'm starting with Spark and I just trying to understand if I want to >> use Spark Streaming, should I use to feed it Flume or Kafka? I think >> there's not a official Sink for Flume to Spark Streaming and it seems >> that Kafka it fits better since gives you readibility. >> >> Could someone give a good scenario for each alternative? When would it >> make sense to use Kafka and when Flume for Spark Streaming? >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >