Re: Help with PIG 0.7 and JOINs

2010-08-20 Thread Raman Yakkala
Hi All, It could be an intermittent issue with the cluster, and the same PIG scripts ran successfully at a later time. Thanks, Raman On Fri, Aug 20, 2010 at 5:22 PM, Raman Yakkala wrote: > Thanks Thejas. Here are the details of the jobs. Not sure about the > location of the log files ...

Re: Help with PIG 0.7 and JOINs

2010-08-20 Thread Raman Yakkala
Thanks Thejas. Here are the details of the jobs. Not sure about the location of the log files ... According to the job detail page, it says the job completed successfully... Hadoop job_201007221306_7119 on srwaishdc1jn0001 *User:* ryakka

Re: Help with PIG 0.7 and JOINs

2010-08-20 Thread Thejas M Nair
You would find the log for MR jobs that failed either at - http://srwaishdc1jn0001:50030/jobdetails.jsp?jobid=job_201007221306_7120 Or http://srwaishdc1jn0001:50030/jobdetails.jsp?jobid=job_201007221306_7119 You can forward the error message if you need further help with this. -Thejas On 8/20/

Help with PIG 0.7 and JOINs

2010-08-20 Thread Raman Yakkala
Hi Guys, I am trying to do join tow data sets and the jobs are failing. There are some warnings reported and I am not good at understanding them. I am seeking your help in adjusting any parameters to the job so that the job might succeed. Here is the grunt shell output: grunt> STORE fjid INTO

Re: ORDER Issue (repost to avoid spam filters)

2010-08-20 Thread Thejas M Nair
I was wondering if the bytes column is having all null values (probably because the input has formatting issues.) Can check you if the following query gives any output - start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int, packets:in

RE: ORDER Issue (repost to avoid spam filters)

2010-08-20 Thread Matthew Smith
UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and the script worked as intended over the data set. This leads me to believe that the issue is with pig-0.7.0 or my configuration. I would however like to not pay for something that is free :D. Any other ideas would be most welcome

Re: Loading CSV Files & LOAD large files behavior in local mode

2010-08-20 Thread Thejas M Nair
To clarify what Jeff said, intermediate data before the join in your case will be stored to disk only if the operations before join require an separate map-reduce job. If the operations between the load and the join are non-blocking , such as a filter or foreach, then the data will be streamed thro

Re: Loading CSV Files & LOAD large files behavior in local mode

2010-08-20 Thread Jeff Zhang
Actually, the intermediate won't been stored in memory. they will be stored in a tmp directory o hdfs, and pig will help you clean up the intermediate data when the job is finished. Yes, BinStorage is a binary format for storing intermediate data and know how to deserialize it to tuples On Fri,

Re: Loading CSV Files & LOAD large files behavior in local mode

2010-08-20 Thread Defenestrator
Right, in cases where you have to load multiple large relations and then do some processing on each relations (filtering, aggregation) before joining them together. One wouldn't want to have all of the relations and intermediate state in memory before the join. So is BinStorage just storing the T

Re: Loading CSV Files & LOAD large files behavior in local mode

2010-08-20 Thread Jeff Zhang
What do you mean "multiple relations with many tuples" ? Do you mean join multiple data set ? And Pig user BinStorage for storing intermediate data. On Fri, Aug 20, 2010 at 2:42 PM, Defenestrator wrote: > Thanks, Jeff. > > A quick follow-up question relating to the loading/storing of data - what