Hari, can you help? TD
On Tue, Jul 29, 2014 at 12:13 PM, dapooley <dapoo...@gmail.com> wrote: > Hi, > > I am trying to integrate Spark onto a Flume log sink and avro source. The > sink is on one machine (the application), and the source is on another. Log > events are being sent from the application server to the avro source server > (a log directory sink on the arvo source prints to verify) > > The aim is to get Spark to also receive the same events that the avro source > is getting. The steps, I believe, are: > > 1. install/start Spark master (on avro source machine). > 2. write spark application, deploy (on avro source machine). > 3. add spark application as a worker to the master. > 4. have spark application configured to same port as avro source > > Test setup is using 2 ubuntu VMs on a Windows host. > > Flume configuration: > > ######################### application ############################## > ## Tail application log file > # /var/lib/apache-flume-1.5.0-bin/bin/flume-ng agent -n cps -c conf -f > conf/flume-conf.properties > # http://flume.apache.org/FlumeUserGuide.html#exec-source > source_agent.sources = tomcat > source_agent.sources.tomcat.type = exec > source_agent.sources.tomcat.command = tail -F > /var/lib/tomcat/logs/application.log > source_agent.sources.tomcat.batchSize = 1 > source_agent.sources.tomcat.channels = memoryChannel > > # http://flume.apache.org/FlumeUserGuide.html#memory-channel > source_agent.channels = memoryChannel > source_agent.channels.memoryChannel.type = memory > source_agent.channels.memoryChannel.capacity = 100 > > ## Send to Flume Collector on Analytics Node > # http://flume.apache.org/FlumeUserGuide.html#avro-sink > source_agent.sinks = avro_sink > source_agent.sinks.avro_sink.type = avro > source_agent.sinks.avro_sink.channel = memoryChannel > source_agent.sinks.avro_sink.hostname = 10.0.2.2 > source_agent.sinks.avro_sink.port = 41414 > > > ######################## avro source ############################## > ## Receive Flume events for Spark streaming > > # http://flume.apache.org/FlumeUserGuide.html#memory-channel > agent1.channels = memoryChannel > agent1.channels.memoryChannel.type = memory > agent1.channels.memoryChannel.capacity = 100 > > ## Flume Collector on Analytics Node > # http://flume.apache.org/FlumeUserGuide.html#avro-source > agent1.sources = avroSource > agent1.sources.avroSource.type = avro > agent1.sources.avroSource.channels = memoryChannel > agent1.sources.avroSource.bind = 0.0.0.0 > agent1.sources.avroSource.port = 41414 > > #Sinks > agent1.sinks = localout > > #http://flume.apache.org/FlumeUserGuide.html#file-roll-sink > agent1.sinks.localout.type = file_roll > agent1.sinks.localout.sink.directory = /home/vagrant/flume/logs > agent1.sinks.localout.sink.rollInterval = 0 > agent1.sinks.localout.channel = memoryChannel > > thank you in advance for any assistance, > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Flume-integration-do-I-understand-this-correctly-tp10879.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.