> > >> Again this fails:
> > >>
> > >> raw_json = LOAD 'cc.json.gz' USING
> > >> com.twitter.elephantbird.pig.load.JsonLoader();
> > >>
> > >> this works:
> > >>
> > >> $ gunzip cc.json.gz
> > >> raw_json = LOAD 'cc.json' USING
> > >> com.twitter.elephantbird.pig.load.JsonLoader();
> > >>
> > >> Any suggestions for this? Or is there any other json loader library
> out
> > >> there? I can write my own but would rather use one if already exists.
> > >>
> > >> Thanks,
> > >>
> > >> Dexin
> > >>
> > >
> >
>
Eric Lubow e: eric.lu...@gmail.com w: eric.lubow.org
ples in pig. Is there a better/more
> efficient way to do this?
> I would like to avoid having loading logic in both the udf and the pig
> script, and generate all "final" tuples in the udf, and then just use a
> split in pig.
> Thanks,
> Marko
>
Eric Lubow e: eric.lu...@gmail.com w: eric.lubow.org
xpression I want to get 'x' number of
> urls matching the regex pattern. I have written a UDF to filter out
> urls based on regular expression. Is there a way in Pig script to
> limit the number of results to 'x' ? ( 'x' is some configurable value)
>
> T
;>> (1) We are mature enough and produce good quality releases
>>> (2) Our interface no longer change in major ways
>>> (3) We have a growing user community and we want the newcomers to
>>> know
>>> that our releases are stable
>>> (4) If the next release is 0.10 and we decide that we should switch
>>> on
>>> the following release going from 0.10 to 1.0 will generate a lot of
>>> confusion.
>>>
>>> I wanted to start this conversation and see what others think before
>>> deciding if it is worth while to call a vote.
>>>
>>> Olga
>>>
>>>
>
Eric Lubow e: eric.lu...@gmail.com w: eric.lubow.org
4)
> (98390,572)
> (98391,567)
>
> Looks great. I'm going to blame it on your version? I'm using pig-0.8
> and hadoop 0.20.2.
>
> --jacob
> @thedatachef
>
>
> On Tue, 2011-02-22 at 08:21 -0500, Eric Lubow wrote:
> > I apologize for the double mailing:
> >
I apologize for the double mailing:
grunt> Y = LOAD 'hdfs:///mnt/test.log.gz' AS (line:chararray);
grunt> foo = LIMIT Y 5;
grunt> dump foo
<0\Mtest.log?]?o?H??}?)
It didn't work out of HDFS.
-e
On Tue, Feb 22, 2011 at 08:18, Eric Lubow wrote:
> I'm n
tFormat();
>>} else {
>>return new PigTextInputFormat();
>>}
>> }
>>
>> And in my custom loader was :
>>
>> public InputFormat getInputFormat() {
>> return new TextInputFormat();
>> }
>>
>>
>> I just co
not compressed.
Since the logs are compressed, my hands are tied. Any suggestions to get
me moving in the right direction? Thanks.
-e
--
Eric Lubow
e: eric.lu...@gmail.com
w: eric.lubow.org
quot;:"(.*[^"])","logged_at":"(.*[^"])"}'))
AS
(exchange_id:chararray,exchange_user_id:chararray,bid_id:chararray,bid_amount:float,win_amount:float,ad_ids:chararray,wv:int,logged_at:chararray);
WIDGET_VERSION_ONLY = FOREACH LOGS_BASE GENERATE wv;
WIDGET_VERSION_COUNT = FOREACH (GROUP WIDGET_VERSION_ONLY BY $0) GENERATE
$0, COUNT($1) as num;
WIDGET_VERSION_SORTED_COUNT = LIMIT(ORDER WIDGET_VERSION_COUNT BY num DESC)
5;
Any help that would push me in the right direction would be greatly
appreciated.
-e
--
Eric Lubow
e: eric.lu...@gmail.com
w: eric.lubow.org