There was a mistake in my configuration. I had hdfs infront of serializer. Changed tier1.sinks.sink1.hdfs.serializer = avro_event
to tier1.sinks.sink1.serializer = avro_event But it is still generating a sequence file. This is what I get. SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.TextK???2-%??-/?? A??,? ?<message>xmldata</message> On Fri, Oct 4, 2013 at 10:43 PM, Deepak Subhramanian < deepak.subhraman...@gmail.com> wrote: > Thanks Hari. > > I speficied the fileType. This is what I have. I will try again and let > you know. > > tier1.sources = httpsrc1 > tier1.channels = c1 > tier1.sinks = sink1 > > tier1.sources.httpsrc1.bind = 127.0.0.1 > tier1.sources.httpsrc1.type = http > tier1.sources.httpsrc1.port = 9999 > tier1.sources.httpsrc1.channels = c1 > tier1.sources.httpsrc1.handler = spikes.flume.XMLHandler > tier1.sources.httpsrc1.handler.nickname = HTTPTesting > > tier1.channels.c1.type = memory > tier1.channels.c1.capacity = 100 > #tier1.sinks.sink1.type = logger > tier1.sinks.sink1.channel = c1 > > > tier1.sinks.sink1.type = hdfs > > tier1.sinks.sink1.hdfs.path = /tmp/flumecollector > tier1.sinks.sink1.hdfs.filePrefix = access_log > tier1.sinks.sink1.hdfs.fileSuffix = .avro > tier1.sinks.sink1.hdfs.fileType = DataStream > tier1.sinks.sink1.hdfs.serializer = avro_event > > I also added this later. > tier1.sinks.sink1.hdfs.serializer.appendNewline = true > tier1.sinks.sink1.hdfs.serializer.compressionCodec = snappy > > > > On Fri, Oct 4, 2013 at 4:56 PM, Hari Shreedharan < > hshreedha...@cloudera.com> wrote: > >> The default data type for HDFS Sink is Sequence file. Set the >> hdfs.fileType to DataStream. See details here: >> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink >> >> >> Thanks, >> Hari >> >> On Friday, October 4, 2013 at 6:52 AM, Deepak Subhramanian wrote: >> >> I tried using the HDFS Sink to generate the avro file by using the >> serializer as avro_event. But it is not generating avro file. But a >> sequence file. Is it not suppose to generate a avro file with default >> schema ? Or do I have to generate the avro data from text in my >> HTTPHandler source ? >> >> "{ \"type\":\"record\", \"name\": \"Event\", \"fields\": [" + >> >> " {\"name\": \"headers\", \"type\": { \"type\": \"map\", >> \"values\": \"string\" } }, " + >> " {\"name\": \"body\", \"type\": \"bytes\" } ] }"); >> >> >> On Thu, Oct 3, 2013 at 3:36 PM, Deepak Subhramanian < >> deepak.subhraman...@gmail.com> wrote: >> >> Hi , >> >> I want to convert xml files in text to an avro file and store it in hdfs >> . I get the xml files as a post request. I extended the HTTPHandler to >> process the XML post request. Do I have to convert the data in text to avro >> in HTTPHandler or does the Avro Sink or HDFSSink convert it directly to >> avro with some configuration details. I want to store the entire xml string >> in an avro variable. >> >> Thanks in advance for any inputs. >> Deepak Subhramanian >> >> >> >> >> -- >> Deepak Subhramanian >> >> >> > > > -- > Deepak Subhramanian > -- Deepak Subhramanian