Hello all, My employer, m6d.com, has given the thumbs up to open source our latest hive tool, hive-protobuf. We created this because we work with protobuf formats often and wanted to be able to directly log an query this types without writing one-off User Defined Functions or Input Formats.
https://github.com/edwardcapriolo/hive-protobuf Hive-protobuf is much like the new avro support and the already existing thrift support. Here is how it works: if you have a sequence file with a serialized protobuf in the key and a serialized protobuf in the value, a table can be created that describes the data to hive. The table needs only be configured with the protobuf generated class name for the key and value and it turns the nested classes into nested structs. We eventually will migrate the project into core hive but we want to let it incubate in github for a time. (For example there is no support for union types at the moment, maybe other kinks or tunes). Please checkout the project and send pull requests if you have patches. Thank you, Edward
