Re: Pigmix: data export

2011-08-23 Thread Alan Gates
PigMix creates data in a text format, using control characters as dividers. copyToLocal will copy it to your local file system, in the control character divided format. To convert it to tab delimited data, with braces and brackets around the complex types you can do: A = load 'page_views' usi

Re: Feature suggestion. Temp files should be self explanatory to help with EXPLAIN debugging.

2011-08-23 Thread Dmitriy Ryaboy
Oh you don't have to do that. If you have access to the job xml file, look for "pig.aliases" -- they are listed right there for your debugging pleasure. D On Tue, Aug 23, 2011 at 4:38 PM, Kevin Burton wrote: > Will do that now… I was thinking of implementing it by myself… however, the > Pig cod

Re: Feature suggestion. Temp files should be self explanatory to help with EXPLAIN debugging.

2011-08-23 Thread Kevin Burton
Will do that now… I was thinking of implementing it by myself… however, the Pig code base is pretty big right now so I was going to throw it in a debugger and figure out where the relation name was. Anyway…. I"ll open a Jira. :) On Tue, Aug 23, 2011 at 1:35 PM, Dmitriy Ryaboy wrote: > That's re

Re: Feature suggestion. Temp files should be self explanatory to help with EXPLAIN debugging.

2011-08-23 Thread Dmitriy Ryaboy
That's reasonable and would be fairly straightforward to implement. Keeping the temp files around should probably be off by default and driven by a command-line switch or conf property. Open a jira? D On Tue, Aug 23, 2011 at 1:18 PM, Kevin Burton wrote: > Here's an explain I'm trying to grok.

Feature suggestion. Temp files should be self explanatory to help with EXPLAIN debugging.

2011-08-23 Thread Kevin Burton
Here's an explain I'm trying to grok. The last Load is frustrating because the file isn't descriptive at all. I have to scroll up and find out which file it was from which mapred job. I the file had a descriptive name (like the name of the stored relation) or how it was computed then it would be

Re: question about pig commands implementation procedure and unit test result

2011-08-23 Thread Daniel Dai
Yes, we use HashMap in 0.8.1. In 0.9, we are using ArrayList, so you might see fewer issues like this. Daniel 2011/8/23 lulynn_2008 : >  Hello, > I have some opinion about pig commands implementation procedure: > For example: > pig commands(from TestNewPlanLogToPhyTranslationVisitor.java): >    

Re: Question about request optimization

2011-08-23 Thread Dmitriy Ryaboy
We should add merge join support to HBaseStorage, it should be able to do that for joins on the table key. Are your locids skewed? Have you tried using 'skewed' join for the last job? Actually, if locations are small, you can even use replicated. Any particular reason to store and load starts and

Re: Pig UDF compilation error

2011-08-23 Thread Thejas Nair
That being said, technically even with the bug, this will compile. That won't even compile in Java, unless input is of type Boolean. But if it is of type boolean input.size() won't compile! (But yeah, I have spent hours debugging a bug introduced by such a typo in C++ code) -Thejas On

Question about request optimization

2011-08-23 Thread Vincent Barat
Hi, Over the bunch of request I run using PIG 0.8.1, the most heavy one is the following: /* load session data from HBase */ start_sessions = load ... (start of sessions) end_sessions = load ... (end of sessions) location = load ... (session location) info = load ... (session in

question about pig commands implementation procedure and unit test result

2011-08-23 Thread lulynn_2008
Hello, I have some opinion about pig commands implementation procedure: For example: pig commands(from TestNewPlanLogToPhyTranslationVisitor.java): a = load 'd1.txt' as (id, c); b = load 'd2.txt'as (id, c); c = load 'd3.txt' as (id, c); d = join a by id, b by c;

Re: What is implemented behind the PIG Joins

2011-08-23 Thread byambajav byambajargal
Pig 0.8.1. On Mon, Aug 22, 2011 at 10:58 PM, Thejas Nair wrote: > Hi Byambajargal, > What version of pig does your distribution use ? > -Thejas > > > On 8/22/11 3:42 AM, byambaa wrote: > >> Hello >> I have a cluster with 11 nodes each of them have 16 GB RAM, 6 core CPU, >> 1 TB HDD and i am using