If you are trying to read gzip files on EMR, you CAN'T use local mode.  Once
you switch to normal mode, everything will start to work.  On EMR, Pig 0.6
(their stock version) will not read gzip or bzip files in local mode.

-e

On Thu, May 19, 2011 at 00:32, Dexin Wang <wangde...@gmail.com> wrote:

> Turns out it's only a problem if I run it in local mode, running it in
> cluster doesn't have this problem. I'm using EB1.2.5.
>
> Wonder how you fix the problem since it seems it's not EB problem. Or are
> you gunzipping it in EB load function?
>
> On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> wrote:
>
> > Which version of EB are you using? I recently fixed this for someone,
> > I believe it's been in every version since 1.2.3
> >
> > D
> >
> > On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <wangde...@gmail.com>
> wrote:
> > > Or is it because I'm using Pig 0.6 where gz format is not supported?
> I'll
> > > run this on aws EMR which only pig 0.6 is supported. I have to use
> later
> > > version of Pig?
> > >
> > > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <wangde...@gmail.com>
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> Anyone using Twitter's elephantbird library? I was using its
> JsonLoader
> > and
> > >> got this error:
> > >>
> > >> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not
> > json-decode
> > >> string
> > >> Unexpected character () at position 0.
> > >> at org.json.simple.parser.Yylex.yylex(Unknown Source)
> > >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
> > >>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> > >> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> > >>
> > >> But if I manually gunzip the file to a clear text json file,
> JsonLoader
> > >> works fine.
> > >>
> > >> Again this fails:
> > >>
> > >> raw_json = LOAD 'cc.json.gz' USING
> > >> com.twitter.elephantbird.pig.load.JsonLoader();
> > >>
> > >> this works:
> > >>
> > >> $ gunzip cc.json.gz
> > >> raw_json = LOAD 'cc.json' USING
> > >> com.twitter.elephantbird.pig.load.JsonLoader();
> > >>
> > >> Any suggestions for this? Or is there any other json loader library
> out
> > >> there? I can write my own but would rather use one if already exists.
> > >>
> > >> Thanks,
> > >>
> > >> Dexin
> > >>
> > >
> >
>

Eric Lubow e: eric.lu...@gmail.com w: eric.lubow.org

Reply via email to