I was able to get approach #2 working here for a flume sink -> spark streaming: http://spark.apache.org/docs/latest/streaming-flume-integration.html
We ended up not moving forward with this approach and will instead look to integrate spark reading from kafka. If you want to give approach #2 from above a shot, below is the relevant part of the ansible playbook that should get you past a couple issues we ran into. I don't have the flume config laying around but there wasn't anything too tricky with it. Note that this sink is not production ready, and does not contain metrics output # SPARK - No longer used, kept for reference - name: Get spark sink dependencies get_url: url=" http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-streaming-flume-sink_2.10/1.3.1/spark-streaming-flume-sink_2.10-1.3.1.jar" dest={{install_directory}}/lib/spark-streaming-flume-sink_2.10-1.3.1.jar - name: Get scala dependency for spark sink get_url: url=" http://search.maven.org/remotecontent?filepath=org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar" dest={{install_directory}}/lib/scala-library-2.10.4.jar - name: Get spark, needed for guava at org/apache/spark-project/util/guava get_url: url=" http://d3kbcqa49mib13.cloudfront.net/spark-1.3.1-bin-hadoop2.6.tgz" dest=/tmp/spark-1.3.1-bin-hadoop2.6.tgz - name: Extract spark command: tar -xvzf /tmp/spark-1.3.1-bin-hadoop2.6.tgz -C /tmp/ creates=/tmp/spark-1.3.1-bin-hadoop2.6 - name: Copy spark libs to flume/lib shell: rsync -ci /tmp/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar {{install_directory}}/lib/spark-assembly-1.3.1-hadoop2.6.0.jar register: rsync_result changed_when: "rsync_result.stdout != ''" - name: Get latest avro, needed for spark sink to not break get_url: url=" http://archive.apache.org/dist/avro/avro-1.7.7/java/{{item}}" dest={{install_directory}}/lib/{{item}} with_items: - avro-1.7.7.jar - avro-ipc-1.7.7.jar - name: Make sure old avro jars dont exist file: path={{install_directory}}/lib/{{item}} state=absent with_items: - avro-1.7.3.jar - avro-ipc-1.7.3.jar HTH, -- Iain Wright This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message. On Tue, Sep 1, 2015 at 9:51 AM, Sutanu Das <[email protected]> wrote: > Hi Team, > > > > Is there anyway to stream flume events to Spark sink? > > > > Is there anyway to steam flume events to Storm sink? > > > > If anyone has successfully accomplished this, we would love to hear about > the high-level configsā¦. > > > > Thanks! > > > > >
