
Honestly, I don't think JsonStorage in Pig will be improved in the future.
More companies are adapting Parquet and ORC FFs, so I see more actions
coming on that front. So you're probably better off switch to one of those
new FFs in the long term. Just a thought.


On Tue, Feb 18, 2014 at 9:02 PM, Simon Reavely <simon.reav...@gmail.com>wrote:

> All,
> I am having trouble serializing to Json from Pig scripts (Storage). Here is
> what I've tried and failed with:
> 1.      -  Pig 0.10+ PigStorage.  Maps are assumed to be String to String,
> so heavily nested structures are not handled.
> 2.       - Hortonworks toJson UDF. Maps are not supported.
> 3.       - Twitter ElephantBird LzoJsonStorage. Arrays/Bags are not
> handled.
> I wondered if anyone is using something to store output from pig scripts as
> Json and whether they use maps.
> If so, how are you writing out Json and what issues have you seen?
> If not, what structured format are you using and why? Avro? Thrift?
> Historically, all our pig jobs results in more tabular results and
> therefore it's not been an issue. The input data is in Json and we've used
> ElephantBird (from twitter) to load it as a map.
> Given the above experience, our only option is to use Pig's JsonLoader to
> load the Json using a specified schema but this will pin us into a single
> schema and the data is not consistent (schemas evolve). Previously we could
> deal with this inside the script but not if we define a single schema for
> the loaded data. So I'm honestly reconsidering our use of Json (which is a
> historical conversation in itself).
> Cheers,
> Simon
> --
> Simon Reavely
> simon.reav...@gmail.com

Reply via email to