Thanks you Doug! That was all I needed to make it work.
Just for the record this is the code: // Writing... Schema.Parser parser = new Schema.Parser(); Schema schema = parser.parse(schemaString); File outFile = new File("generated.avro"); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); dataFileWriter.create(schema, outFile); dataFileWriter.appendEncoded(body); dataFileWriter.close(); Thanks again! On Feb 7, 2014, at 2:29 PM, Doug Cutting <cutt...@apache.org> wrote: > You might use DataFileWriter#appendEncoded: > > http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendEncoded(java.nio.ByteBuffer) > > If the body has just single instance of the record then you'd call this once. > If you have multiple instances then you might change the body to have the > schema {"type":"array", "items", "bytes"}. > > Doug > > > On Fri, Feb 7, 2014 at 12:06 PM, Daniel Rodriguez <df.rodriguez...@gmail.com> > wrote: > Hi all, > > Some context (not an expert Java programmer, and just starting with > AVRO/Flume): > > I need to transfer avro files from different servers to HDFS I am trying to > use Flume to do it. > I have a Flume spooldir source (reading the avro files) with an avro sink and > avro sink with a HDFS sink. Like this: > > servers | hadoop > spooldir src -> avro sink --------> avro src -> hdfs > > When Flume spooldir deserialize the avro files creates an flume event with > two fields: 1) header contains the schema; 2) and in the body field has the > binary Avro record data, not including the schema or the rest of the > container file elements. See the flume docs: > http://flume.apache.org/FlumeUserGuide.html#avro > > So the avro sink creates an avro file like this: > > {"headers": {"flume.avro.schema.literal": > "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"}, > "body": {"bytes": "{BYTES}"}} > > So now I am trying to write a serializer since flume only includes an > FlumeEvent serializer creating avro files like the one above, not the > original avro files on the servers. > > I am almost there, I got the schema from the header field and the bytes from > the body field. > But now I need to create write the AVRO file based on the bytes, not the > values from the fields, I cannot do: r.put("field", "value") since I don't > have the values, just the bytes. > > This is the code: > > File file = TESTFILE; > > DatumReader<GenericRecord> datumReader = new > GenericDatumReader<GenericRecord>(); > DataFileReader<GenericRecord> dataFileReader = new > DataFileReader<GenericRecord>(file, datumReader); > GenericRecord user = null; > while (dataFileReader.hasNext()) { > user = dataFileReader.next(user); > > Map headers = (Map) user.get("headers"); > > Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal"); > String schema = headers.get(schemaHeaderKey).toString(); > > ByteBuffer body = (ByteBuffer) user.get("body"); > > > // Writing... > Schema.Parser parser = new Schema.Parser(); > Schema schemaSimpleWrapper = parser.parse(schema); > GenericRecord r = new GenericData.Record(schemaSimpleWrapper); > > // NOT SURE WHAT COMES NEXT > } > > Is possible to actually create the AVRO files from the value bytes? > > I appreciate any help. > > Thanks, > Daniel >