I have asked similar question but regarding deserialization of such records written as Bytes. Did you try to deserilize them ? What does your schemaString look like?
Please refer to thread : Avro Byte Blob Ser De<https://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3cCAGQuZejTTU9Sw2jMsDDUA9_XQeXM2jxEAQNX5O_HAnqABk=0...@mail.gmail.com%3e> Thanks On Fri, Feb 7, 2014 at 7:29 PM, Daniel Rodriguez <df.rodriguez...@gmail.com>wrote: > Thanks you Doug! > > That was all I needed to make it work. > > Just for the record this is the code: > > // Writing... > Schema.Parser parser = new Schema.Parser(); > Schema schema = parser.parse(schemaString); > > File outFile = new File("generated.avro"); > DatumWriter<GenericRecord> datumWriter = new > GenericDatumWriter<GenericRecord>(schema); > DataFileWriter<GenericRecord> dataFileWriter = new > DataFileWriter<GenericRecord>(datumWriter); > dataFileWriter.create(schema, outFile); > dataFileWriter.appendEncoded(body); > dataFileWriter.close(); > > Thanks again! > > > On Feb 7, 2014, at 2:29 PM, Doug Cutting <cutt...@apache.org> wrote: > > You might use DataFileWriter#appendEncoded: > > > http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendEncoded(java.nio.ByteBuffer) > > If the body has just single instance of the record then you'd call this > once. If you have multiple instances then you might change the body to > have the schema {"type":"array", "items", "bytes"}. > > Doug > > > On Fri, Feb 7, 2014 at 12:06 PM, Daniel Rodriguez < > df.rodriguez...@gmail.com> wrote: > >> Hi all, >> >> Some context (not an expert Java programmer, and just starting with >> AVRO/Flume): >> >> I need to transfer avro files from different servers to HDFS I am trying >> to use Flume to do it. >> I have a Flume spooldir source (reading the avro files) with an avro sink >> and avro sink with a HDFS sink. Like this: >> >> servers | hadoop >> spooldir src -> avro sink --------> avro src -> hdfs >> >> When Flume spooldir deserialize the avro files creates an flume event >> with two fields: 1) header contains the schema; 2) and in the body field >> has the binary Avro record data, not including the schema or the rest of >> the container file elements. See the flume docs: >> http://flume.apache.org/FlumeUserGuide.html#avro >> >> So the avro sink creates an avro file like this: >> >> {"headers": {"flume.avro.schema.literal": >> "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"}, >> "body": {"bytes": "{BYTES}"}} >> >> So now I am trying to write a serializer since flume only includes an >> FlumeEvent serializer creating avro files like the one above, not the >> original avro files on the servers. >> >> I am almost there, I got the schema from the header field and the bytes >> from the body field. >> But now I need to create write the AVRO file based on the bytes, not the >> values from the fields, I cannot do: r.put("field", "value") since I >> don't have the values, just the bytes. >> >> This is the code: >> >> File file = TESTFILE; >> >> DatumReader<GenericRecord> datumReader = new >> GenericDatumReader<GenericRecord>(); >> DataFileReader<GenericRecord> dataFileReader = new >> DataFileReader<GenericRecord>(file, datumReader); >> GenericRecord user = null; >> while (dataFileReader.hasNext()) { >> user = dataFileReader.next(user); >> >> Map headers = (Map) user.get("headers"); >> >> Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal"); >> String schema = headers.get(schemaHeaderKey).toString(); >> >> ByteBuffer body = (ByteBuffer) user.get("body"); >> >> >> // Writing... >> Schema.Parser parser = new Schema.Parser(); >> Schema schemaSimpleWrapper = parser.parse(schema); >> GenericRecord r = new GenericData.Record(schemaSimpleWrapper); >> >> // NOT SURE WHAT COMES NEXT >> } >> >> Is possible to actually create the AVRO files from the value bytes? >> >> I appreciate any help. >> >> Thanks, >> Daniel >> > > >