I have looked at the Avro 1.6.0 code and am not sure how Avro distinguishes
between field boundaries when reading null values.
The BinaryEncoder class (which is where I land when debugging my code) has an
empty method for writeNull: how does the parser then distinguuish between
adjacent
I don't have a specific use-class that is problematic, but was trying to
understand how it all works internally. Following your comment about indexes I
looked in GenericDatumWriter and sure enough the union is tagged so we know
which part of the union was written:
case UNION:
int index
we are working on a very sparse table with say 500 columns where we do
batch uploads that typically only contain a subset of the columns (say
100), and we run multiple map-reduce queries on subsets of the columns
(typically less than 50 columns go into a single map-reduce job).
my question is the
On 01/23/2012 02:18 PM, Koert Kuipers wrote:
is this considered abuse of avro's versioning capabilities?
Not at all. Using a subset of the fields in Avro is called projection
and can provide significant performance improvements.
Doug
https://issues.apache.org/jira/browse/AVRO-981
I took Joe Crobak's advice and removed snappy as a dependency in the python
client for avro. With the patch in AVRO-981 applied, Avro installs, builds
and functions on Mac OS X.
--
Russell Jurney
twitter.com/rjurney
russell.jur...@gmail.com