All,


I am having trouble serializing to Json from Pig scripts (Storage). Here is
what I've tried and failed with:

1.      -  Pig 0.10+ PigStorage.  Maps are assumed to be String to String,
so heavily nested structures are not handled.

2.       - Hortonworks toJson UDF. Maps are not supported.

3.       - Twitter ElephantBird LzoJsonStorage. Arrays/Bags are not
handled.



I wondered if anyone is using something to store output from pig scripts as
Json and whether they use maps.

If so, how are you writing out Json and what issues have you seen?

If not, what structured format are you using and why? Avro? Thrift?



Historically, all our pig jobs results in more tabular results and
therefore it's not been an issue. The input data is in Json and we've used
ElephantBird (from twitter) to load it as a map.



Given the above experience, our only option is to use Pig's JsonLoader to
load the Json using a specified schema but this will pin us into a single
schema and the data is not consistent (schemas evolve). Previously we could
deal with this inside the script but not if we define a single schema for
the loaded data. So I'm honestly reconsidering our use of Json (which is a
historical conversation in itself).



Cheers,

Simon



-- 
Simon Reavely
simon.reav...@gmail.com

Reply via email to