Thanks for this, its what im looking for. Binary Avro should be good to use, thanks.
On Wed, Aug 22, 2012 at 7:59 PM, Mohit Anchlia <[email protected]>wrote: > > > On Tue, Aug 21, 2012 at 8:16 PM, ashutosh(오픈플랫폼개발팀) < > [email protected]> wrote: > >> Hi All, >> >> >> >> I am using the “avro_event” serializer with writable format as >> DataStream file type to store the events into hdfs. >> >> I would like to read the file for further analysis. I am new to avro and >> don’t have idea; how to develop the de-serializer to read the flume’s >> events written in hdfs file. >> >> >> >> If anyone could share the sample or example, it would be nice to me. >> Please help…. >> >> >> > > Look at this test to see how to read data. But in general you would want > to create your own serializer specific to your schema. Otherwise it makes > sense to just use sequence files. > > > http://svn.apache.org/repos/asf/flume/trunk/flume-ng-core/src/test/java/org/apache/flume/serialization/TestFlumeEventAvroEventSerializer.java > > >> Thanks & Regards, >> >> Ashutosh Sharma >> >> >> >> *From:* Bhaskar V. Karambelkar [mailto:[email protected]] >> *Sent:* Wednesday, August 22, 2012 12:22 AM >> *To:* [email protected] >> *Subject:* Re: Can HDFSSink write headers as well? >> >> >> >> >> >> On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー < >> [email protected]> wrote: >> >> Hi David, >> >> Currently there is no way to write headers to HDFS using the built-in >> Flume functionality. >> >> >> >> This is not entirely true, the following combination will write headers >> to HDFS, in an avro_data file format (binary). >> >> >> >> agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream >> >> agent.sinks.hdfsBinarySink.serializer = avro_client >> >> agent.sinks.hdfsBinarySink.hdfs.writeFormat = writable >> >> >> >> The serializer used is part of flume distribution viz. >> >> >> flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java >> >> >> >> A file thus written can be processed with AVRO mapreduce API found in >> AVRO distribution. >> >> >> >> Also note that simply using DataStream doesn't mean it's a text file, the >> serializer and hdfs.writeFormat also decide >> >> whether the file is text or binary. >> >> >> >> I've read the entire HDFS sink code and exprimented with it a lot, so if >> you want more details, let me know. >> >> >> >> >> >> >> If you are writing to text or binary files on HDFS (i.e. you have set >> hdfs.fileType = DataStream or CompressedStream in your config), then you >> can supply your own custom serializer, which will allow you to write >> headers to HDFS. You will need to write a serializer that implements >> org.apache.flume.serialization.EventSerializer. >> >> If, on the other hand, you are writing to HDFS SequenceFiles, then >> unfortunately there is no way to customize the way that events are >> serialized, so you cannot write event headers to HDFS. This is a known >> issue (FLUME-1100) and I have supplied a patch to fix it. >> >> Chris. >> >> >> >> >> On 2012/08/21 11:36, David Capwell wrote: >> >> I was wondering if I pass random data to an event's header, can the >> HDFSSink write it to HDFS? I know it can use the headers to split the data >> into different paths, but what about writing the data to HDFS itself? >> >> thanks for your time reading this email. >> >> >> >> >> >> >> 이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에 >> 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못 >> 전송된 경우, 발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다. >> This E-mail may contain confidential information and/or copyright >> material. This email is intended for the use of the addressee only. If you >> receive this email by mistake, please either delete it without reproducing, >> distributing or retaining copies thereof or notify the sender immediately. >> > >
