RE: Dumb question guys

2011-09-14 Thread Marek Miglinski
Thanks for your reply, I can't use JOIN and I will explain why. So here I have data... UP: 9,user1,sam1 5,user1,sam2 3,user1,sam3 9,user2,flin TX: 7,user1,wow 9,user2,pop I need to join tx with up by user and closest epoch (first field). If I do JOIN I will get (JOIN BY user):

Starting Hadoop File System process getting Stuck

2011-09-14 Thread kiranprasad
Hi I am trying to start the PIG in hadoop mode, but it is getting stuck. Pls help. Below is where the process is getting stuck. [kiranprasad.g@pig4 pig-0.8.1]$ bin/pig 2011-09-14 21:48:25,589 [main] INFO org.apache.pig.Main - Logging error messages to:

RE: reading xml file within a UDF

2011-09-14 Thread william.dowling
I do this: define analyze_unif `analyze_unif_recs.py` input (stdin) output (stdout USING PigStreaming(',')) ship ('$scriptDir/analyze_unif_recs.py'); UnifLines = load '$unif_xml' using org.apache.pig.piggybank.storage.XMLLoader('REC') as (doc:chararray); UnifXmlByDocId

Re: reading xml file within a UDF

2011-09-14 Thread Baraa Mohamad
thank you for your reply, so can I do the same with java scripts, and to be more clear, I have a folder with multiple xml files thatI want to read and parse in order to extract some attributes (att1,att2) values ex elem att1=452 att2=7587elem1/elem thanks On Wed, Sep 14, 2011 at 4:53 PM,

Re: Filter grouped data with two percentile

2011-09-14 Thread Xiaomeng Wan
Pierre, Union is not allowed within foreach. Fortunately, you donot need it. I just realize the code I give you doesnot generate what you want, actually it generates the complement of what you want. Try something like this: a = group records by id; b = foreach a { On Wed, Sep 14, 2011 at

Re: Filter grouped data with two percentile

2011-09-14 Thread Pierre-Luc Brunet
Was there more that was supposed to be added to this? -- Pierre On 2011-09-14, at 12:26 PM, Xiaomeng Wan wrote: Pierre, Union is not allowed within foreach. Fortunately, you donot need it. I just realize the code I give you doesnot generate what you want, actually it generates the

Re: Filter grouped data with two percentile

2011-09-14 Thread Xiaomeng Wan
wrong button or what? not sure, anyway, try this: a = group records by id; b = foreach a { x = COUNT(records); y = order records by thevalue; z = limit y x*0.95; z1 = order records by thevalue desc; z2 = limit z1 x*0.9; generate group, z2; } never try this before, if no luck, you need to find

Re: Filter grouped data with two percentile

2011-09-14 Thread Pierre-Luc Brunet
Under pig 0.9.1-SNAPSHOT, I get: 2011-09-14 12:54:10,075 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: line 7, column 12 Syntax error, unexpected symbol at or near 'y' Under pig 0.8.1-cdh3u1, I get: 2011-09-14 12:55:35,765 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR

Re: Filter grouped data with two percentile

2011-09-14 Thread Xiaomeng Wan
it is just not supported, similar to the case I mentioned before. I would suggest you write your own UDF to do that. Shawn On Wed, Sep 14, 2011 at 10:59 AM, Pierre-Luc Brunet pierre...@zestuff.com wrote: Under pig  0.9.1-SNAPSHOT, I get: 2011-09-14 12:54:10,075 [main] ERROR

Pig Conditionals (Do I have to use UDFs)?

2011-09-14 Thread Eli Finkelshteyn
Hi, I'd like to generate based on exclusive conditions (something like the CASE statement in SQL). An example: Say I have data that looks like: (a, 1) (a, 2) (b, 2) (c, 1) (d, 3) (d, 4) And I want to just convert each of the numbers to their written forms to get: (a, one) (a, two) (b,

Re: Starting Hadoop File System process getting Stuck

2011-09-14 Thread Alan Gates
If you run hadoop directly (doing something like: bin/hadoop fs -ls) can you connect? Alan. On Sep 14, 2011, at 4:32 AM, kiranprasad wrote: Hi I am trying to start the PIG in hadoop mode, but it is getting stuck. Pls help. Below is where the process is getting stuck.

Re: Pig Conditionals (Do I have to use UDFs)?

2011-09-14 Thread Clay B.
I have done mappings in the past using joins and mapping files too. E.g. generate a file of mappings and load it as a relation, then join. A rather heavy weight solution though. -Clay On Wed, 14 Sep 2011, Eli Finkelshteyn wrote: Hi, I'd like to generate based on exclusive conditions

Re: Pig Conditionals (Do I have to use UDFs)?

2011-09-14 Thread Ryan Hoegg
What about putting the mappings into their own relation? I tried this with 0.9.0: example.txt: a,1 a,2 b,2 c,1 d,3 d,4 mapping.txt: 1,one 2,two 3,three 4,four MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS (number:int,name:chararray); EXAMPLE_SOURCE = LOAD 'example.txt' USING

Re: Pig Conditionals (Do I have to use UDFs)?

2011-09-14 Thread Eli Finkelshteyn
Sorry, bad example, I guess. I want something I can do case statements with. In this case I could map instead, but if I wanted to use less straight-forward cases (i.e. one case where number == 1, another where number between 2 and 4, another where number greater than 5, etc...), it would be

Re: Pig Conditionals (Do I have to use UDFs)?

2011-09-14 Thread Ryan Hoegg
What about trying something with SPLIT and UNION: SPLIT EXAMPLE_SOURCE INTO GOOD IF number5, BETTER IF (number=2 AND number=4), BEST IF (number=5); I did a few FOREACH and a UNION, and got this: (a,6,best) (b,5,best) (d,8,best) (a,6,good) (d,8,good) (a,2,better) (b,2,better) (c,3,better)

Re: Pig Conditionals (Do I have to use UDFs)?

2011-09-14 Thread Eli Finkelshteyn
Ah, neat! That would do the trick. Seems like a lot of extra steps, but I'll take it if that's how it's done in PIG. Thanks! On 9/14/11 5:51 PM, Ryan Hoegg wrote: What about trying something with SPLIT and UNION: SPLIT EXAMPLE_SOURCE INTO GOOD IF number5, BETTER IF (number=2 AND number=4),

Re: Pig Conditionals (Do I have to use UDFs)?

2011-09-14 Thread Dmitriy Ryaboy
There's a fair bit of overhead there. UDFs are ok and normal in pig. Everything is done with them. Don't be afraid of udfs :). There's some pain with the compile cycle (edit code in java, test, compile, jar, register...). That's where inline python udfs become handy! D On Wed, Sep 14, 2011 at

Re: Starting Hadoop File System process getting Stuck

2011-09-14 Thread kiranprasad
Hi I ve tried with hadoop fs -ls but still facing the same problem. [kiranprasad.g@pig4 hadoop-0.20.2]$ bin/start-all.sh starting namenode, logging to /home/kiranprasad.g/hadoop-0.20.2/bin/../logs/hadoop-kiranprasad.g-namenode-pig4.out 10.0.0.62: starting datanode, logging to