On Mon, Sep 17, 2012 at 9:40 AM, Markus Strickler <mar...@braindump.ms> wrote:
> I'm currently trying to convert already existing JSON (not generated by avro) 
> to avro and am wondering if there is some generic way to do this (maybe an 
> avro schema that matches arbitrary JSON)?

Yes, there is support for reading and writing arbitrary Json data as Avro:
  http://avro.apache.org/docs/current/api/java/org/apache/avro/data/Json.html

Json.Writer will take Json data that's been parsed into Jackson's
JsonNode representation and write it as Avro data using the schema
Json.SCHEMA, and Json.Reader will read Avro data written with this
Schema into a JsonNode.  Note that just because you wrote the data
with Json.Writer doesn't mean you need to read it with Json.Reader.
You could instead read it with GenericDatumReader, from MapReduce or
Hive.

However using a more-specific schema than Json.SCHEMA will result in a
smaller and faster Avro encoding for your data.  It's also likely to
result in a schema that much better describes your data for use in
Pig, Hive, etc.

If all of your records are of the same schema, and that schema doesn't
have unions (i.e., a given field always has values of the same type,
all objects have the same set of fields, fully populated) then you may
be able to use Avro's JsonDecoder.  Note however that Avro's
JsonEncoder/JsonDecoder are not generally appropriate for arbitrary
Json, but rather are intended to represent Avro data as Json.  (Unions
are the biggest difference.  Avro's Json encoding  uses a Json object
to tag each union value with the intended type.  For example, an Avro
union of a string and an int which has an int value of 1 would be
encoded in Json as {"int":1}.)

For a given schema it is simple to write a short Java program that
converts from Json to Avro.  A general tool for such conversions
doesn't yet exist but would make a great addition to Avro (if anyone's
looking for a way to contribute).  The core of this might be a method
that walks a JsonNode and a Schema in parallel, returning an object in
Avro's generic representation.

Doug

Reply via email to