Re: Create Avro from bytes, not by fields

Milind Vaidya Tue, 11 Feb 2014 14:30:11 -0800

I have asked similar question but regarding deserialization of such records
written as Bytes.
Did you try to deserilize them ?
What does your schemaString look like?


Please refer to thread : Avro Byte Blob Ser
De<https://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3cCAGQuZejTTU9Sw2jMsDDUA9_XQeXM2jxEAQNX5O_HAnqABk=0...@mail.gmail.com%3e>


Thanks





On Fri, Feb 7, 2014 at 7:29 PM, Daniel Rodriguez
<df.rodriguez...@gmail.com>wrote:

> Thanks you Doug!
>
> That was all I needed to make it work.
>
> Just for the record this is the code:
>
> // Writing...
> Schema.Parser parser = new Schema.Parser();
> Schema schema = parser.parse(schemaString);
>
> File outFile = new File("generated.avro");
> DatumWriter<GenericRecord> datumWriter = new
> GenericDatumWriter<GenericRecord>(schema);
> DataFileWriter<GenericRecord> dataFileWriter = new
> DataFileWriter<GenericRecord>(datumWriter);
> dataFileWriter.create(schema, outFile);
> dataFileWriter.appendEncoded(body);
> dataFileWriter.close();
>
> Thanks again!
>
>
> On Feb 7, 2014, at 2:29 PM, Doug Cutting <cutt...@apache.org> wrote:
>
> You might use DataFileWriter#appendEncoded:
>
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendEncoded(java.nio.ByteBuffer)
>
> If the body has just single instance of the record then you'd call this
> once.  If you have multiple instances then you might change the body to
> have the schema {"type":"array", "items", "bytes"}.
>
> Doug
>
>
> On Fri, Feb 7, 2014 at 12:06 PM, Daniel Rodriguez <
> df.rodriguez...@gmail.com> wrote:
>
>> Hi all,
>>
>> Some context (not an expert Java programmer, and just starting with
>> AVRO/Flume):
>>
>> I need to transfer avro files from different servers to HDFS I am trying
>> to use Flume to do it.
>> I have a Flume spooldir source (reading the avro files) with an avro sink
>> and avro sink with a HDFS sink. Like this:
>>
>>            servers                      |                  hadoop
>> spooldir src -> avro sink     -------->       avro src -> hdfs
>>
>> When Flume spooldir deserialize the avro files creates an flume event
>> with two fields: 1) header contains the schema; 2) and in the body field
>> has the binary Avro record data, not including the schema or the rest of
>> the container file elements. See the flume docs:
>> http://flume.apache.org/FlumeUserGuide.html#avro
>>
>> So the avro sink creates an avro file like this:
>>
>> {"headers": {"flume.avro.schema.literal":
>> "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
>> "body": {"bytes": "{BYTES}"}}
>>
>> So now I am trying to write a serializer since flume only includes an
>> FlumeEvent serializer creating avro files like the one above, not the
>> original avro files on the servers.
>>
>> I am almost there, I got the schema from the header field and the bytes
>> from the body field.
>> But now I need to create write the AVRO file based on the bytes, not the
>> values from the fields, I cannot do: r.put("field", "value") since I
>> don't have the values, just the bytes.
>>
>> This is the code:
>>
>> File file = TESTFILE;
>>
>> DatumReader<GenericRecord> datumReader = new
>> GenericDatumReader<GenericRecord>();
>> DataFileReader<GenericRecord> dataFileReader = new
>> DataFileReader<GenericRecord>(file, datumReader);
>> GenericRecord user = null;
>> while (dataFileReader.hasNext()) {
>>     user = dataFileReader.next(user);
>>
>>     Map headers = (Map) user.get("headers");
>>
>>     Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
>>     String schema = headers.get(schemaHeaderKey).toString();
>>
>>     ByteBuffer body = (ByteBuffer) user.get("body");
>>
>>
>>     // Writing...
>>     Schema.Parser parser = new Schema.Parser();
>>     Schema schemaSimpleWrapper = parser.parse(schema);
>>     GenericRecord r =  new GenericData.Record(schemaSimpleWrapper);
>>
>>     // NOT SURE WHAT COMES NEXT
>> }
>>
>> Is possible to actually create the AVRO files from the value bytes?
>>
>> I appreciate any help.
>>
>> Thanks,
>> Daniel
>>
>
>
>

Re: Create Avro from bytes, not by fields

Reply via email to