[ 
https://issues.apache.org/jira/browse/AVRO-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008942#comment-17008942
 ] 

jason mathews edited comment on AVRO-2070 at 1/6/20 4:12 PM:
-------------------------------------------------------------

This issue should be categorized as a Bug not an Improvement.

I'm running into this issue and have to create a custom GenericDatumWriter 
class to allow for mixed number type instances as this fix would elminate doing 
so.

Using a mix number types in Java (Short, Integer, Long, Float) when type is 
Double results in a ClassCastException.

Java Example: 
{code:java}
Schema doubleType = Schema.create(Schema.Type.DOUBLE);
Schema.Field field = new Schema.Field("d", doubleType);
List<Schema.Field> fields = Collections.singletonList(field);
Schema schema = Schema.createRecord("test", "doc", "", false, fields);

// serialize
GenericDatumWriter<GenericData.Record> datumWriter = new 
GenericDatumWriter<>(schema);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try(DataFileWriter<GenericData.Record> dataWriter = new 
DataFileWriter<>(datumWriter)) {
  dataWriter.create(schema, bos);
  GenericData.Record r = new GenericData.Record(schema);
  r.put("d", 123.456);
  dataWriter.append(r);
 
  r = new GenericData.Record(schema);
  r.put("d", 123); // try as Integer
  dataWriter.append(r); // throws exception
{code}
Output: 
{noformat}
Exception in thread "main" 
org.apache.avro.file.DataFileWriter$AppendWriteException:
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Double
{noformat}
 But having mixed numeric types in Python fastavro implementation has no such 
number restriction and a double schema type for example can contain a mix of 
floating point or integers.

Python Example: 
{code:java}
from fastavro import json_writer, json_reader, parse_schema
schema = {
 "namespace": "",
 "type": "record",
 "name": "record",
 "fields": [
 { "name": "d", "type": "double" }
 ]
}
parsed_schema = parse_schema(schema)
records = [
 { u'd': 1.2345 },
 { u'd': 12345 }
]
with open('test.avro', 'w') as out:
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 1.2345 } )
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 12345 } )
 json_writer(out, parsed_schema, records)
with open('test.avro', 'r') as fo:
 avro_reader = json_reader(fo, schema)
 for record in avro_reader:
 print(record)
"""
output:
{'d': 1.2345}
{'d': 12345}
"""
{code}
 


was (Author: docjason):
This issue should be categorized as a Bug not an Improvement.

I'm running into this issue and have to create a custom GenericDatumWriter 
class to allow for mixed number type instances as this fix would elminate doing 
so.

Using a mix number types in Java (Short, Integer, Long, Float) when type is 
Double results in a ClassCastException.

Java Example:

 
{code:java}
Schema doubleType = Schema.create(Schema.Type.DOUBLE);
Schema.Field field = new Schema.Field("d", doubleType);
List<Schema.Field> fields = Collections.singletonList(field);
Schema schema = Schema.createRecord("test", "doc", "", false, fields);
// serialize
GenericDatumWriter<GenericData.Record> datumWriter = new 
GenericDatumWriter<>(schema);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try(DataFileWriter<GenericData.Record> dataWriter = new 
DataFileWriter<>(datumWriter)) {
  dataWriter.create(schema, bos);
  GenericData.Record r = new GenericData.Record(schema);
  r.put("d", 123.456);
  dataWriter.append(r);
 
  r = new GenericData.Record(schema);
  r.put("d", 123); // try as Integer
  dataWriter.append(r); // throws exception
 
{code}
Output:

 
{noformat}
Exception in thread "main" 
org.apache.avro.file.DataFileWriter$AppendWriteException:  
java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Double
{noformat}
 

But having mixed numeric types in Python fastavro implementation has no such 
number restriction and a double schema type for example can contain a mix of 
floating point or integers.

Python Example:

 
{code:java}
from fastavro import json_writer, json_reader, parse_schema
schema = {
 "namespace": "",
 "type": "record",
 "name": "record",
 "fields": [
 { "name": "d", "type": "double" }
 ]
}
parsed_schema = parse_schema(schema)
records = [
 { u'd': 1.2345 },
 { u'd': 12345 }
]
with open('test.avro', 'w') as out:
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 1.2345 } )
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 12345 } )
 json_writer(out, parsed_schema, records)
with open('test.avro', 'r') as fo:
 avro_reader = json_reader(fo, schema)
 for record in avro_reader:
 print(record)
"""
output:
{'d': 1.2345}
{'d': 12345}
"""
{code}
 

> Tolerate any Number when writing primitive values in Java in 
> GenericDatumWriter
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-2070
>                 URL: https://issues.apache.org/jira/browse/AVRO-2070
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Daniil Gitelson
>            Priority: Major
>
> Tolerating any Number (instead of concrete Long, Double, Float) makes 
> possible to use mutable Number implmentation for performance reasons 
> (specially for primitive collection iterations)
> Currently, this only works for int only:
> {code:java}
>       // Here it works
>       case INT:     out.writeInt(((Number)datum).intValue()); break;
>       // This should be replaced with ((Number)datum).longValue() etc
>       case LONG:    out.writeLong((Long)datum);       break;
>       case FLOAT:   out.writeFloat((Float)datum);     break;
>       case DOUBLE:  out.writeDouble((Double)datum);   break;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to