I attempted to put together a little Flume+Kafka tutorial including using Camus to run map-reduce jobs pulling from Kafka and writing to HDFS. My example uses a spoolDirSource, KafkaChannel & KafkaSink. This may be of some help to you.
https://github.com/mbkeane/BigDataTechCon/blob/master/README.md ________________________________________ From: Simone Roselli [[email protected]] Sent: Wednesday, January 06, 2016 10:33 AM To: [email protected] Subject: Spooldir needs a Kafka topic defined in the agent.conf Hi, I'm having trouble configuring a spooldir source using the Kafka sink In Flume-NG I can use the Kafka sink without specify a topic name in the agent.conf, since the event contains this topic name in the headers. Things look different using the spooldir source. If you don't provide a topic name in agent.conf, it will only try a default one (default-flume-topic). Is there a way to force spooldir source using the topic name in the headers? ps: I'm using Spooldir with the AVRO deserialization; no other particular configuration. "fileHeader" is set as "true" Many thanks Simone Roselli ITE Sysadmin [email protected] http://www.plista.com This email and any files included with it may contain privileged, proprietary and/or confidential information that is for the sole use of the intended recipient(s). Any disclosure, copying, distribution, posting, or use of the information contained in or attached to this email is prohibited unless permitted by the sender. If you have received this email in error, please immediately notify the sender via return email, telephone, or fax and destroy this original transmission and its included files without reading or saving it in any manner. Thank you.
