Forgive me for asking a FAQ but what is the current IDE of choice for Pig?
I used to use a text editor and command line.
I understand that PigPen (eclipse plugin) is no longer supported and
does not work with current Hadoop. (But I haven't seen the
documentation or wiki updated.)
I have heard of P
I installed Hadoop and Pig myself using tarballs on my ubuntu boxes, but I
see that most people use Cloudera's Distribution for Hadoop (aka CDH).
Is there any reason not to go straight to CDH? Do I need to carefully remove
my old installations before installing the CDH debian packages?
Cheers
Ale
On 23 March 2011 18:12, Jonathan Holloway wrote:
> I've got a general question surrounding the output of various Pig scripts
> and generally where people are
> storing that data and in what kind of format?
> ...
> At present the results from my Pig scripts end up in HDFS in Pig bag/tuple
> format
I am using maps a lot so I guess this is related to PIG-919 which is closed
but not really fixed.
https://issues.apache.org/jira/browse/PIG-919
This suggested that I force the relevant types to (chararray) and that seems
to have worked as a work around.
Alex
On 7 February 2011 19:39, Alex
Can anyone give me any hints on why a JOIN may be failing with this weird
error
I can DESCRIBE the two tables it is joining
justurls: {tweetid: bytearray,userid: bytearray,url: bytearray}
userdb: {ouruserid: bytearray,friendid: bytearray}
and the join itself is
urlspertimeline = JOIN use
A)
Am I right in thinking that no UDF can turn
(1, (2,3,4) )
into
(1, 2 )
(1, 3 )
(1, 4 )
because you always get out the same number of tuples as you put in?
B)
Would FLATTEN ($1) do that - if the (2,3,4) was a bag, and not a tuple?
I'm quite confused as to when bags get created and why they
I'm trying to understand the best way of setting up repeated processing of
continuously generated data - like logs.
I can manually copy files from normal FS to HDFS and kick off pig scripts
but ideally I want something automatic - preferably every hour, or possibly
more often. I also want to proce
I am developing a new UDF for loading Json data. It differs from those
currently available because it tries to construct the supplied nested maps
and arrays as Pig data structures rather than a single flat map.
Anyway, I am sometimes getting this error which I don't understand.
2011-02-05 13:04:4
Can anyone point me to a Loader UDF which creates nested tuples - ie tuples
with bags/other tuples within them?
I believe you couldn't do this before about Pig 0.7.0 but I can't see any
examples of where it is done.
Thanks
Alex
On 29 January 2011 13:43, Jacob Perkins wrote:
>
> Write a map only wukong script that parses the json as you want it. See
> the example here:
>
>
> http://thedatachef.blogspot.com/2011/01/processing-json-records-with-hadoop-and.html
>
>
Hi Jacob,
Thanks very much for helping me out. I haven't he
I wonder if discussion of the Piggybank and other User Defined Fields is
best done here (since it is *using* Pig) or on the Development list (because
it is enhancing Pig).
I'm trying to load some Json into pig using the PigJsonLoader.java UDF which
Kim Vogt posted about back in September. (It isn'
11 matches
Mail list logo