Or is it because I'm using Pig 0.6 where gz format is not supported? I'll run this on aws EMR which only pig 0.6 is supported. I have to use later version of Pig?
On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <wangde...@gmail.com> wrote: > Hi, > > Anyone using Twitter's elephantbird library? I was using its JsonLoader and > got this error: > > WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string > Unexpected character () at position 0. > at org.json.simple.parser.Yylex.yylex(Unknown Source) > at org.json.simple.parser.JSONParser.nextToken(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > > But if I manually gunzip the file to a clear text json file, JsonLoader > works fine. > > Again this fails: > > raw_json = LOAD 'cc.json.gz' USING > com.twitter.elephantbird.pig.load.JsonLoader(); > > this works: > > $ gunzip cc.json.gz > raw_json = LOAD 'cc.json' USING > com.twitter.elephantbird.pig.load.JsonLoader(); > > Any suggestions for this? Or is there any other json loader library out > there? I can write my own but would rather use one if already exists. > > Thanks, > > Dexin >