> I wrote a little C tool using Avro-C to convert JSON to Avro and thought > may be someone here may find it useful. > > https://github.com/grisha/json2avro > > The purpose is to be useful in converting messy legacy JSON in which > some elements might be missing or of wrong type. Even though there is no > schema resolution per se here, json2avro will attempt to use the default > specified in the schema if the corresponding JSON element is missing and > will attempt to try the types specified in a union until one succeeds. > > json2avro lets you pick from null, snappy, deflate and lzma codecs, > specify a custom block size and optionally skips over JSON lines that it > is unable to parse. I'm also thinking of adding a target max file size > so that it would automatically split output into multiple sizes.
Very cool! Kind of the reverse of avrocat or avropipe. We could clean it up and add it as another C command-line tool if you like. > It uses Jansson as the JSON parser which is conveniently bundled with > Avro-C. (One thing that I'm not clear on is that Jansson cannot handle > nulls, not sure if this is a Jansson-specific limitation or something > inherent to JSON.) Can you elaborate on this? Jansson should support null JSON values (it's the keyword null, not the string value "null"). And the Avro C bindings should use that for Avro null values. cheers –doug
signature.asc
Description: OpenPGP digital signature