Hey all,
I have several relations that all have the same keys. Something like:
A = id,countA
B = id,countB
C = id,countC
...
I would like to do a multi-way, full outer join such that any tuple that is
null or non-existent is still joined. The approach right now is to do it
step wise, as the Pig
Hey Jon,
I think a common approach is to use Pig (and MR/Hadoop in general) as purely
the heavy lifter, doing all the merge-downs, aggregations and such of the
data. At Nokia we tend to output a lot of data from Pig/MR as TSV or CSV
(using PigStorage) and then use Sqoop to push that into a MySQL D
Hey Dan
This usually means that you have mismatched Hadoop jar versions somewhere. I
encountered a similar problem with Oozie trying to talk to HDFS. Maybe try
posting to the Hadoop user list as well. In general, you should just need
the same hadoop-core.jar as on your cluster when you run Pig. Fr
March 2011 10:38, Dan Brickley wrote:
> On 13 March 2011 10:35, Josh Devins wrote:
> > Hey guys,
> >
> > I'm still seeing references (http://wiki.apache.org/pig/PiggyBank) to
> > PiggyBank being in the contrib module in SVN. What is the official
> PiggyBank
&g
Hey guys,
I'm still seeing references (http://wiki.apache.org/pig/PiggyBank) to
PiggyBank being in the contrib module in SVN. What is the official PiggyBank
repo at the moment: GitHub/wilbur/Piggybank or Apache SVN contrib/piggybank?
Are they out of sync at the moment/which is the current authorit
; > > That's the path I started down today, I don't suppose the UDF you wrote
> > is
> > > in the public domain
> > > at all - would you consider contributing it to piggybank.jar at all?
> How
> > > does it fare with large datasets
> > > as
Hey Jon,
I wrote one not long ago that just relies on Apache Commons Math underlying
the UDF. It's pretty straightforward as the Percentile implementation will
sort your numbers before doing percentile calculations. The UDF then just
needs to take a bag/tuple, pull out all the fields as double[] a
n Pig this way. You can of
> course run your data through a UDF that would take a tuple whose first
> argument is a list of key names, and invoke it like so:
>
> jsonStore = FOREACH thing GENERATE
> toMap('id foo bar', *) AS json:map[];
>
> -D
>
> On Thu, Nov 25
Hi all,
I have a a simple schema that I want to store as JSON. So I've written a
simple JsonStorage class but it requires that the tuple's first field is a
map. The problem is in converting a regular tuple into a map:
DESCRIBE thing;
> thing: {id: chararray,field1: chararray,field2: chararray}
W
ttp://pig.apache.org/docs/r0.7.0/api/
>>
>> PigServer is the starting point. and internally will have formations of
>> logical/physical plan of jobs.The executionengine executes the job. Refer
>> files under o.a.p.backend.hadoop.executionengine.
>> More details under http
Hi all,
I'm happily using Pig to ORDER BY and LIMIT some large relations quite
effectively. However I'm curious about how these are/would be implemented in
"raw" MapReduce. Can anyone shed some light/point to some details, examples
or pseudo-code somewhere?
Cheers,
Josh
I just ran a Pig job and for the first time noticed the output at the end of
the job (and of course a matching counter):
Encountered Warning UDF_WARNING_1 108939522 time(s)
What exactly does this count refer to and is there any way to find out what
the actual warning is about? I've checked the jo
a Pig 8 solution?
>
> On Oct 20, 2010, at 10:22 AM, Josh Devins wrote:
>
> > You might want to also have a look at Pig trunk/0.8.0 since there's
> already
> > some work been done on this topic.
> >
> > https://issues.apache.org/jira/browse/PIG-1404
> >
>
You might want to also have a look at Pig trunk/0.8.0 since there's already
some work been done on this topic.
https://issues.apache.org/jira/browse/PIG-1404
Cheers,
Josh
On 20 October 2010 16:58, Dave Wellman wrote:
> All,
>
> I have a solution for writing unit test in Java to test pig scri
Ah, sorry, just saw that this should read:
PigStatusReporter.getInstance() and there is no special counters
keyword/variable. However is this common for Pig, being able to access
static methods directly from within a Pig script?
Thanks,
Josh
On 18 October 2010 11:56, Josh Devins wrote
rds anywhere.
> I am not sure what you mean by 3) -- you can just increment
> counters. PigStatusReporter.getInstance().getCounter(myEnum).increment(1L);
>
> (watch out for a null reporter when you are still in the client-side).
>
> -D
>
>
> On Sat, Oct 16, 2010 at 2:2
I've seen a few threads about counters, PigStats, Elephant-Bird's stats
utility class, etc.
http://www.mail-archive.com/pig-u...@hadoop.apache.org/msg00900.html
http://www.mail-archive.com/user%40pig.apache.org/msg00034.html
Has any progress been made on this or to provide a comprehensive
stats/c
Crap, of course :)
Many thanks, that did the job.
Cheers,
Josh
On 8 October 2010 19:12, Mehmet Tepedelenlioglu wrote:
> I= foreach A generate group, flatten(items);
>
> I believe that should do it.
>
> On Oct 8, 2010, at 9:13 AM, Josh Devins wrote:
>
> > I have a sim
I have a simple schema that contains an inner bag. What I need to
essentially do is that for each tuple in the inner bag, I need to create a
new tuple in a new outer bag. This is easier shown than explained! Consider
the following schema and data:
DESCRIBE A;
A: {id: chararray, items: {item: chara
19 matches
Mail list logo