I'm not familiar with Avro good enough to propose an "Avro solution" for
your problem :(

If you want to serialize default values into Avro for some fields you
should provide the default values in code explicitly when writing to Avro.
Another approach is to declare the fields as nullable using union types
(e.g. [null, int]) and use default values in code explicitly when reading
from Avro.

I believe the "default" key you used in Avro schema is meant for schema
evolution http://avro.apache.org/docs/current/spec.html#Schema+Resolution

   - if the reader's record schema has a field that contains a default
   value, and writer's schema does not have a field with the same name, then
   the reader should use the default value from its field.


пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <sarvagya.p...@gmail.com>:

> Hi Stanislav,
>
> Thanks for the reply. What I want to achieve is that data arriving in Avro
> writer may not contain all field as specified in the example above. I would
> like to save default value if possible or retrieve the default value when
> using DataFileReader. Is this possible? Should the data always contain all
> the keys specified in the schema. I tried using ["int", "null"], "default"
> : 0, but this was able to save the data if any field is not present, but
> using DataFileReader I got None instead of default value 0. Any help will
> be much appreciated. Thanks.
>
> On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <
> s.savulc...@gmail.com> wrote:
>
>> Hi,
>>
>> I believe default values only work for readers, not writers.
>>
>> Spec says that (http://avro.apache.org/docs/current/spec.html):
>> > default: A default value for this field, used when reading instances
>> that lack this field (optional).
>>
>> On 7 июля 2016 г., at 21:16, Sarvagya Pant <sarvagya.p...@gmail.com>
>> wrote:
>>
>> I am trying to implement Avro to replace some codes that tries to write
>> data in CSV. This is because CSV cannot store the type of the field and all
>> data are treated as string when trying to consume. I have copied the code
>> for Avro from its website and would like to set a default value if there is
>> no field.
>>
>> My avro file looks like this:
>>
>> {
>>     "type" : "record",
>>     "name" : "data",
>>     "namespace" : "my.example",
>>     "fields" : [
>>         {"name" : "domain", "type" : "string", "default" : "EMPTY"},
>>         {"name" : "ip", "type" : "string", "default" : "EMPTY"},
>>         {"name" : "port", "type" : "int", "default" : 0},
>>         {"name" : "score", "type" : "int", "default" : 0}
>>     ]
>> }
>>
>> I have written a simple python file that is expected to work. It is given
>> below:
>>
>> import avro.schema
>> from avro.datafile import DataFileReader, DataFileWriter
>> from avro.io import DatumReader, DatumWriter
>>
>> schema = avro.schema.parse(open("data.avsc", "rb").read())
>>
>> writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
>> writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
>> writer.append({"ip": "1.2.3.4", "port" : 80})
>> writer.append({"domain": "another domain", "score" : 100})
>> writer.close()
>>
>> reader = DataFileReader(open("users.avro", "rb"), DatumReader())
>> for data in reader:
>>     print data
>> reader.close()
>>
>> However, if I try to run this program, I get error that data are not
>> mapped according to schema.
>>
>>     Traceback (most recent call last):
>>   File "D:\arko.py", line 8, in <module>
>>     writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
>>   File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
>>   File "build\bdist.win32\egg\avro\io.py", line 769, in write
>>
>> avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score':
>> 20, 'port': 8080} is not an example of the schema {
>>   "namespace": "my.example",
>>   "type": "record",
>>   "name": "userInfo",
>>   "fields": [
>>     {
>>       "default": "EMPTY",
>>       "type": "string",
>>       "name": "domain"
>>     },
>>     {
>>       "default": "EMPTY",
>>       "type": "string",
>>       "name": "ip"
>>     },
>>     {
>>       "default": 0,
>>       "type": "int",
>>       "name": "port"
>>     },
>>     {
>>       "default": 0,
>>       "type": "int",
>>       "name": "score"
>>     }
>>   ]
>> }
>> [Finished in 0.1s with exit code 1]
>>
>> I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.
>>
>> --
>>
>> *Sarvagya Pant*
>> *Kathmandu, Nepal*
>>
>>
>>
>
>
> --
>
> *Sarvagya Pant*
> *Kathmandu, Nepal*
>

Reply via email to