Thanks for your reply,
I can't use JOIN and I will explain why. So here I have data...
UP:
9,user1,sam1
5,user1,sam2
3,user1,sam3
9,user2,flin
TX:
7,user1,wow
9,user2,pop
I need to join tx with up by user and closest epoch (first field). If I do JOIN
I will get (JOIN BY user):
Hi
I am trying to start the PIG in hadoop mode, but it is getting stuck. Pls help.
Below is where the process is getting stuck.
[kiranprasad.g@pig4 pig-0.8.1]$ bin/pig
2011-09-14 21:48:25,589 [main] INFO org.apache.pig.Main - Logging error
messages to:
I do this:
define analyze_unif `analyze_unif_recs.py`
input (stdin)
output (stdout USING PigStreaming(','))
ship ('$scriptDir/analyze_unif_recs.py');
UnifLines = load '$unif_xml'
using org.apache.pig.piggybank.storage.XMLLoader('REC')
as (doc:chararray);
UnifXmlByDocId
thank you for your reply,
so can I do the same with java scripts,
and to be more clear, I have a folder with multiple xml files thatI want to
read and parse in order to extract some attributes (att1,att2) values
ex
elem att1=452 att2=7587elem1/elem
thanks
On Wed, Sep 14, 2011 at 4:53 PM,
Pierre,
Union is not allowed within foreach. Fortunately, you donot need it. I
just realize the code I give you doesnot generate what you want,
actually it generates the complement of what you want. Try something
like this:
a = group records by id;
b = foreach a {
On Wed, Sep 14, 2011 at
Was there more that was supposed to be added to this?
--
Pierre
On 2011-09-14, at 12:26 PM, Xiaomeng Wan wrote:
Pierre,
Union is not allowed within foreach. Fortunately, you donot need it. I
just realize the code I give you doesnot generate what you want,
actually it generates the
wrong button or what? not sure, anyway, try this:
a = group records by id;
b = foreach a { x = COUNT(records); y = order records by thevalue; z =
limit y x*0.95; z1 = order records by thevalue desc; z2 = limit z1
x*0.9; generate group, z2; }
never try this before, if no luck, you need to find
Under pig 0.9.1-SNAPSHOT, I get:
2011-09-14 12:54:10,075 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1200: line 7, column 12 Syntax error, unexpected symbol at or near 'y'
Under pig 0.8.1-cdh3u1, I get:
2011-09-14 12:55:35,765 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
it is just not supported, similar to the case I mentioned before. I
would suggest you write your own UDF to do that.
Shawn
On Wed, Sep 14, 2011 at 10:59 AM, Pierre-Luc Brunet
pierre...@zestuff.com wrote:
Under pig 0.9.1-SNAPSHOT, I get:
2011-09-14 12:54:10,075 [main] ERROR
Hi,
I'd like to generate based on exclusive conditions (something like the
CASE statement in SQL). An example:
Say I have data that looks like:
(a, 1)
(a, 2)
(b, 2)
(c, 1)
(d, 3)
(d, 4)
And I want to just convert each of the numbers to their written forms to
get:
(a, one)
(a, two)
(b,
If you run hadoop directly (doing something like: bin/hadoop fs -ls) can you
connect?
Alan.
On Sep 14, 2011, at 4:32 AM, kiranprasad wrote:
Hi
I am trying to start the PIG in hadoop mode, but it is getting stuck. Pls
help.
Below is where the process is getting stuck.
I have done mappings in the past using joins and mapping files too.
E.g. generate a file of mappings and load it as a relation, then join. A
rather heavy weight solution though.
-Clay
On Wed, 14 Sep 2011, Eli Finkelshteyn wrote:
Hi,
I'd like to generate based on exclusive conditions
What about putting the mappings into their own relation? I tried this with
0.9.0:
example.txt:
a,1
a,2
b,2
c,1
d,3
d,4
mapping.txt:
1,one
2,two
3,three
4,four
MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS
(number:int,name:chararray);
EXAMPLE_SOURCE = LOAD 'example.txt' USING
Sorry, bad example, I guess. I want something I can do case statements
with. In this case I could map instead, but if I wanted to use less
straight-forward cases (i.e. one case where number == 1, another where
number between 2 and 4, another where number greater than 5, etc...), it
would be
What about trying something with SPLIT and UNION:
SPLIT EXAMPLE_SOURCE INTO GOOD IF number5, BETTER IF (number=2 AND
number=4), BEST IF (number=5);
I did a few FOREACH and a UNION, and got this:
(a,6,best)
(b,5,best)
(d,8,best)
(a,6,good)
(d,8,good)
(a,2,better)
(b,2,better)
(c,3,better)
Ah, neat! That would do the trick. Seems like a lot of extra steps, but
I'll take it if that's how it's done in PIG. Thanks!
On 9/14/11 5:51 PM, Ryan Hoegg wrote:
What about trying something with SPLIT and UNION:
SPLIT EXAMPLE_SOURCE INTO GOOD IF number5, BETTER IF (number=2 AND
number=4),
There's a fair bit of overhead there.
UDFs are ok and normal in pig. Everything is done with them. Don't be afraid
of udfs :).
There's some pain with the compile cycle (edit code in java, test, compile,
jar, register...). That's where inline python udfs become handy!
D
On Wed, Sep 14, 2011 at
Hi
I ve tried with hadoop fs -ls but still facing the same problem.
[kiranprasad.g@pig4 hadoop-0.20.2]$ bin/start-all.sh
starting namenode, logging to
/home/kiranprasad.g/hadoop-0.20.2/bin/../logs/hadoop-kiranprasad.g-namenode-pig4.out
10.0.0.62: starting datanode, logging to
18 matches
Mail list logo