IDE choice?

2012-09-25 Thread Alex McLintock
Forgive me for asking a FAQ but what is the current IDE of choice for Pig? I used to use a text editor and command line. I understand that PigPen (eclipse plugin) is no longer supported and does not work with current Hadoop. (But I haven't seen the documentation or wiki updated.) I have heard of P

Pig on CDH? or roll your own

2011-03-28 Thread Alex McLintock
I installed Hadoop and Pig myself using tarballs on my ubuntu boxes, but I see that most people use Cloudera's Distribution for Hadoop (aka CDH). Is there any reason not to go straight to CDH? Do I need to carefully remove my old installations before installing the CDH debian packages? Cheers Ale

Re: Storing and reporting off Pig data

2011-03-23 Thread Alex McLintock
On 23 March 2011 18:12, Jonathan Holloway wrote: > I've got a general question surrounding the output of various Pig scripts > and generally where people are > storing that data and in what kind of format? > ... > At present the results from my Pig scripts end up in HDFS in Pig bag/tuple > format

Re: Weird stack trace NullableBytesWritable vs NullableText

2011-02-07 Thread Alex McLintock
I am using maps a lot so I guess this is related to PIG-919 which is closed but not really fixed. https://issues.apache.org/jira/browse/PIG-919 This suggested that I force the relevant types to (chararray) and that seems to have worked as a work around. Alex On 7 February 2011 19:39, Alex

Weird stack trace NullableBytesWritable vs NullableText

2011-02-07 Thread Alex McLintock
Can anyone give me any hints on why a JOIN may be failing with this weird error I can DESCRIBE the two tables it is joining justurls: {tweetid: bytearray,userid: bytearray,url: bytearray} userdb: {ouruserid: bytearray,friendid: bytearray} and the join itself is urlspertimeline = JOIN use

Some basic ideas

2011-02-07 Thread Alex McLintock
A) Am I right in thinking that no UDF can turn (1, (2,3,4) ) into (1, 2 ) (1, 3 ) (1, 4 ) because you always get out the same number of tuples as you put in? B) Would FLATTEN ($1) do that - if the (2,3,4) was a bag, and not a tuple? I'm quite confused as to when bags get created and why they

Repetitive pig scripts...

2011-02-06 Thread Alex McLintock
I'm trying to understand the best way of setting up repeated processing of continuously generated data - like logs. I can manually copy files from normal FS to HDFS and kick off pig scripts but ideally I want something automatic - preferably every hour, or possibly more often. I also want to proce

Error in new logical plan. Try -Dpig.usenewlogicalplan=false

2011-02-05 Thread Alex McLintock
I am developing a new UDF for loading Json data. It differs from those currently available because it tries to construct the supplied nested maps and arrays as Pig data structures rather than a single flat map. Anyway, I am sometimes getting this error which I don't understand. 2011-02-05 13:04:4

pig Loader UDF with nested tuples

2011-02-01 Thread Alex McLintock
Can anyone point me to a Loader UDF which creates nested tuples - ie tuples with bags/other tuples within them? I believe you couldn't do this before about Pig 0.7.0 but I can't see any examples of where it is done. Thanks Alex

Re: UDF discussion? Here or on the dev list? / Json Loading

2011-01-30 Thread Alex McLintock
On 29 January 2011 13:43, Jacob Perkins wrote: > > Write a map only wukong script that parses the json as you want it. See > the example here: > > > http://thedatachef.blogspot.com/2011/01/processing-json-records-with-hadoop-and.html > > Hi Jacob, Thanks very much for helping me out. I haven't he

UDF discussion? Here or on the dev list? / Json Loading

2011-01-29 Thread Alex McLintock
I wonder if discussion of the Piggybank and other User Defined Fields is best done here (since it is *using* Pig) or on the Development list (because it is enhancing Pig). I'm trying to load some Json into pig using the PigJsonLoader.java UDF which Kim Vogt posted about back in September. (It isn'