Re: Taking advantage of structure when doing UDFs and whatnot?

2011-01-04 Thread Alan Gates
On Jan 4, 2011, at 2:07 PM, Jonathan Coveney wrote: Thanks for the help Alan, I really appreciate it. Can you currently extend interfaces in python UDF's? I am not super familiar with how jython and python interact in that capacity. No, we just introduced the Python UDFs in 0.8. We haven

Re: Taking advantage of structure when doing UDFs and whatnot?

2011-01-04 Thread Jonathan Coveney
Thanks for the help Alan, I really appreciate it. Can you currently extend interfaces in python UDF's? I am not super familiar with how jython and python interact in that capacity. The internal sort in the foreach and the using 'collected' (assuming I can get it to work :) should be big wins. 201

Re: Taking advantage of structure when doing UDFs and whatnot?

2011-01-04 Thread Alan Gates
Answers inline. On Jan 4, 2011, at 11:10 AM, Jonathan Coveney wrote: I wasn't quite sure what title this, but hopefully it'll make sense. I have a couple of questions relating to a query that ultimately seeks to do this You have 1 10 1 12 1 15 1 16 2 1 2 2 2 3 2 6 You want your output to

Re: Master thesis about Hive/Pig/MapReduce

2011-01-04 Thread Alan Gates
Hi Michal, A couple of areas where you could study performance without duplicating Robert Stewart's work come to mind. One is in the area of how data skew affects performance. This is a very real world concern since in my experience almost all input data is power law distributed. Consi

Master thesis about Hive/Pig/MapReduce

2011-01-04 Thread Michał Anglart
Hi Everybody, I'm a soon-to-graduate student of computer science at the Univeristy of Wrocław in Poland. Currently I'm starting to write my master thesis and I'm looking for some inspirations/ideas. First of all I want to write about MapReduce - as far as I know nobody took such topics as their t

UDFContext in 0.8 LoadFunc?

2011-01-04 Thread Eric Tschetter
I have a custom LoadFunc (I'm actually just extending PigStorage) that has some added logic to spider a given path and pick out the paths that I want. I am currently doing the spidering in setLocation because that seemed like the place to do it. It appears as if this is getting called on both the

Re: Taking advantage of structure when doing UDFs and whatnot?

2011-01-04 Thread Kris Coward
On Tue, Jan 04, 2011 at 02:10:52PM -0500, Jonathan Coveney wrote: > I wasn't quite sure what title this, but hopefully it'll make sense. I have > a couple of questions relating to a query that ultimately seeks to do this > > You have > > 1 10 > 1 12 > 1 15 > 1 16 > 2 1 > 2 2 > 2 3 > 2 6 > > You

Taking advantage of structure when doing UDFs and whatnot?

2011-01-04 Thread Jonathan Coveney
I wasn't quite sure what title this, but hopefully it'll make sense. I have a couple of questions relating to a query that ultimately seeks to do this You have 1 10 1 12 1 15 1 16 2 1 2 2 2 3 2 6 You want your output to be the difference between the successive numbers in the second column, ie 1

Re: Welcome Pig's newest committer, Julien Le Dem!

2011-01-04 Thread Andreas Paepcke
Great to see the community growing. That's an important part of any technical infrastructure. Andreas On Mon, Jan 3, 2011 at 7:59 PM, Ashutosh Chauhan wrote: > Congratulations, Julien! We look forward to cool new features for Pig > from you :) > > Ashutosh > > On Mon, Jan 3, 2011 at 17:21, Dmit

Re: Welcome Pig's newest committer, Julien Le Dem!

2011-01-04 Thread Ashutosh Chauhan
Congratulations, Julien! We look forward to cool new features for Pig from you :) Ashutosh On Mon, Jan 3, 2011 at 17:21, Dmitriy Ryaboy wrote: > Fellow Pig users, > Join me in extending warm congratulations to the newest committer to the Pig > project, Julien Le Dem. > Julien has done outstandi

Hadoop India Summit 2011 - Call for Papers now Open

2011-01-04 Thread Basant Verma
Hi Hadoop enthusiasts, Apache Hadoop has become the de-facto platform for developing large-scale data-intensive applications. It has been used actively in academia and Industry for research and data mining. Hadoop Summit provides an opportunity for understanding the latest trends and roadmap