I wrote a little C tool using Avro-C to convert JSON to Avro and thought may be someone here may find it useful.

https://github.com/grisha/json2avro

The purpose is to be useful in converting messy legacy JSON in which some elements might be missing or of wrong type. Even though there is no schema resolution per se here, json2avro will attempt to use the default specified in the schema if the corresponding JSON element is missing and will attempt to try the types specified in a union until one succeeds.

json2avro lets you pick from null, snappy, deflate and lzma codecs, specify a custom block size and optionally skips over JSON lines that it is unable to parse. I'm also thinking of adding a target max file size so that it would automatically split output into multiple sizes.

It uses Jansson as the JSON parser which is conveniently bundled with Avro-C. (One thing that I'm not clear on is that Jansson cannot handle nulls, not sure if this is a Jansson-specific limitation or something inherent to JSON.)

This is rather simple code (no tests, and not even a "make install" yet) and lacks support for some features, namely enums and aliases, but it's good enough to be useful. It does seem pretty fast, slightly faster than the avro-tools fromjson option (though my tests were hardly scientific).

Enjoy, and any feedback is very much appreciated!

Grisha

Reply via email to