Re: Zebra in Pig > 0.9.2

2013-01-31 Thread Jameson Li
Can anyone tell us the answer? Someone from yahoo has suggest me to use zebra, although we do not use it. But I really interest in it, its really usage and why there is not document in pig > 0.9.2 . 2013/1/29 Devon Crouse > We're looking into using Zebra for merge joins, and noticed that the l

Re: Reg:

2013-01-31 Thread Jameson Li
2012/12/25 Kshiva Kps > sorry to ask you if possible could you pls advice on below points > > In general in Real time how we will write PIG scripts. > What is your real time mean? runtime debug? > 1. PIG scripts alone in Eclipse > maybe this can help you. http://wiki.apache.org/pig/PigPen > 2.

Re: DBStorage

2013-01-31 Thread Jameson Li
what's the output when you replace the DBStorage to normal text output? Is it normally the context you expected? *李剑 Jameson Hadoop工程师* *MediaV **聚胜万合* 上海 ・ 北京 ・ 深圳 ・ 广州 ・ 杭州 ・ 厦门 ・ 南京 * * 上海市闸北区恒丰路580号恒汇国际大厦10楼 200070 MOB:18621578671 TEL:021-60568188*8247 FAX:021-60568

Re: Behaviour on a failed cast.

2013-01-31 Thread Jameson Li
I think If the code is java code in the udf, you'd better catch the exception and do something, just not ignore that. If your pig code meet a cast failed, you can step by step debug your code line by line in the interactive mode. Sometimes when after one step(as streaming) the schema seems not cer

Re: DBStorage

2013-01-31 Thread Jameson Li
I am sorry that I forget to change the default mail addr, and using the company mail addr to send the before mail. Please ignore it. 专注于Mysql,MSSQL,Oracle,Hadoop 2013/2/1 Jameson Li > what's the output when you replace the DBStorage to normal text output? > Is it normally the

Re: why pig stream "grep" not work, but "awk index" is well

2011-07-06 Thread Jameson Li
Yes. I am sure. 2011/7/6 Dmitriy Ryaboy > Works for me. > > Make sure you have grep on the path of all your nodes? > > D > > On Mon, Jul 4, 2011 at 7:59 PM, Jameson Li wrote: > > I have a doubt that: > > > > sometime when I run the pig code:

Re: How to make pig works with hadoop-0.20-append

2011-07-05 Thread Jameson Li
http://thedatachef.blogspot.com/2011/01/apache-pig-08-with-cloudera-cdh3.html a little difference: update "for jar in $HADOOP_HOME/hadoop-core-*.jar $HADOOP_HOME/lib/* ; do" to "for jar in $HADOOP_HOME/hadoop*core*.jar $HADOOP_HOME/lib/* ; do" 2011/7/4 Dmitriy Ryaboy > Use the jar built with "a

why pig stream "grep" not work, but "awk index" is well

2011-07-04 Thread Jameson Li
I have a doubt that: sometime when I run the pig code: c = stream b through `grep "spider"`; It will return the error message: Received Error while processing the map plan: 'grep "spider" ' failed with exit status: 1 But when I use the pig code: c = stream b through `awk '{a=index($0,"spider");i

Re: Error after build

2011-07-03 Thread Jameson Li
How about the pig jar lib path? Sometime after building my UDF, I register the new udf jar, but I had forgot the old udf jar remain in the $PIG_HOME/lib/, and when the pig code used the UDF class, and it will use the classes compiled in the old udf rather than the new one. Maybe your troublesome is

Re: MultiStorage for many key values

2011-06-17 Thread Jameson Li
I have the same doubt as Thomas Kappler. And it will be kind of you if someone can say something more detailed about 'custom partitioner' said by Daniel Dai. I think the docs 'piglatin_ref2.html#partitionby' seems too simple. 2011/6/17 Daniel Dai > Try custom partitioner: http://pig.apache.org/

Re: How to get/operate the InputFileName in pig 0.8.1

2011-06-17 Thread Jameson Li
iStorageSwithKey. Am I right? Thanks very much. 2011/6/17 Jameson Li > I am sorry that I have a fault. > My newest jar file is in the dir /home/user/project/lib/myUDF.jar, but > there has an old jar file in the pig lib dir $PIG-HOME/lib(/opt/pig/lib ). > Unfortunately after registe

Re: How to get/operate the InputFileName in pig 0.8.1

2011-06-16 Thread Jameson Li
I am sorry that I have a fault. My newest jar file is in the dir /home/user/project/lib/myUDF.jar, but there has an old jar file in the pig lib dir $PIG-HOME/lib(/opt/pig/lib ). Unfortunately after registering the jar file--/home/user/project/lib/myUDF.jar, when the pig code execuded, it will first

Re: How to get/operate the InputFileName in pig 0.8.1

2011-06-16 Thread Jameson Li
t jar file after re-compile? Thanks very much. 2011/6/15 Daniel Dai > Check http://wiki.apache.org/pig/PigStorageWithInputPath, also you will > need to disable split combination: -Dpig.noSplitCombination=true > > Daniel > > > On 06/13/2011 04:07 AM, Jameson Li wrote: >

How to get/operate the InputFileName in pig 0.8.1

2011-06-13 Thread Jameson Li
Hi, I hava some files in the hdfs://path/load/ like this: file_29_1 file_47_1 file_16_1 ... These files are generate by other M/R jobs. The files are only contains one column, and the number in the file name between 'file_' and '_1' is a id. I want to add the id into its input form

Re: how to operate a map type

2011-06-02 Thread Jameson Li
'value'? > > Alan. > > > On May 24, 2011, at 3:05 AM, Jameson Li wrote: > > OK.OK.I know that just write UDFs. >> I have to write UDFs, and see you.. >> And I still think there should be grammar support for map operation both >> static key and dy

Re: how to operate a map type

2011-05-24 Thread Jameson Li
you > may need to put into UDF. > > Grammar support for map is based on static key, eg: m#'key1'. Your use case > is mostly dealing dynamic keys, which you may rely on yourself currently. > > Daniel > > -----Original Message- From: Jameson Li > Sent: Monday, Ma

Re: how to operate a map type

2011-05-23 Thread Jameson Li
of map values > > The script is like: > b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m; > c = foreach b generate GetKey(m) as key, m; > d = group c by key; > e = foreach c generate group, SUM(GetValues(c.m)); > > > Daniel > > > On 05/23/2011

how to operate a map type

2011-05-23 Thread Jameson Li
14138427,9#0.1107544,282#0.18699136]) I just want group by the map key and sum the map value just like: c = group b by $0#key; d = foreach c generate group,SUM(b.$0#value); How could I write the code? Thanks, Jameson Li.

Re: store less files

2011-04-01 Thread Jameson Li
Jameson, > > > > Do you mind to add something like this: > > > > c = order b by $0 parallel n; > > store c into '20110331-ab'; > > > > you can order on anything. it will add a reduce and give you less files. > > > > Regards, > > Sh

Re: store less files

2011-04-01 Thread Jameson Li
gt; -- > Jameson Lopp > Software Engineer > Bronto Software, Inc. > > > On 04/01/2011 03:57 AM, Jameson Li wrote: > >> Hi, >> >> When I run the below pig codes: >> a = load '/logs/2011-03-31'; >> b = filter a by $1=='a'

store less files

2011-04-01 Thread Jameson Li
have a doubt that how I could store less files when I use pig to store files in the HDFS. Thanks, Jameson Li.