Anyone have idea about the problem? I still cannot solve it. On Wed, Nov 2, 2016 at 11:33 PM, mingda li <limingda1...@gmail.com> wrote:
> Yeah, the log file's content is as following: > > 1 Pig Stack Trace > > 2 --------------- > > 3 ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: > [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > 4 > > 5 Failed to parse: Pig script failed to parse: > > 6 <line 3, column 27> Failed to generate logical plan. Nested > exception: org.apache.pig.backend.executionengine.ExecException: ERROR > 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java > .lang., > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > 7 at org.apache.pig.parser.QueryParserDriver.parse( > QueryParserDriver.java:199) > > 8 at org.apache.pig.PigServer$Graph.validateQuery(PigServer. > java:1707) > > 9 at org.apache.pig.PigServer$Graph.registerQuery(PigServer. > java:1680) > > 10 at org.apache.pig.PigServer.registerQuery(PigServer.java:623) > > 11 at org.apache.pig.tools.grunt.GruntParser.processPig( > GruntParser.java:1082) > > 12 at org.apache.pig.tools.pigscript.parser. > PigScriptParser.parse(PigScriptParser.java:505) > > 13 at org.apache.pig.tools.grunt.GruntParser.parseStopOnError( > GruntParser.java:230) > > 14 at org.apache.pig.tools.grunt.GruntParser.parseStopOnError( > GruntParser.java:205) > > 15 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) > > 16 at org.apache.pig.Main.run(Main.java:565) > > 17 at org.apache.pig.Main.main(Main.java:177) > > 18 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > 19 at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > > 20 at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > 21 at java.lang.reflect.Method.invoke(Method.java:606) > > 22 at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > 23 Caused by: > > 24 <line 3, column 27> Failed to generate logical plan. Nested > exception: org.apache.pig.backend.executionengine.ExecException: ERROR > 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java > .lang., > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > 25 at org.apache.pig.parser.LogicalPlanBuilder.buildUDF( > LogicalPlanBuilder.java:1572) > > 26 at org.apache.pig.parser.LogicalPlanGenerator.func_ > eval(LogicalPlanGenerator.java:9403) > > 27 at org.apache.pig.parser.LogicalPlanGenerator. > projectable_expr(LogicalPlanGenerator.java:11082) > > 28 at org.apache.pig.parser.LogicalPlanGenerator.var_expr( > LogicalPlanGenerator.java:10841) > > 29 at org.apache.pig.parser.LogicalPlanGenerator.expr( > LogicalPlanGenerator.java:10190) > > 30 at org.apache.pig.parser.LogicalPlanGenerator.flatten_ > generated_item(LogicalPlanGenerator.java:7519) > > 31 at org.apache.pig.parser.LogicalPlanGenerator.generate_ > clause(LogicalPlanGenerator.java:17621) > > 32 at org.apache.pig.parser.LogicalPlanGenerator.foreach_ > plan(LogicalPlanGenerator.java:16013) > > 33 at org.apache.pig.parser.LogicalPlanGenerator.foreach_ > clause(LogicalPlanGenerator.java:15880) > > 34 at org.apache.pig.parser.LogicalPlanGenerator.op_ > clause(LogicalPlanGenerator.java:1933) > > 35 at org.apache.pig.parser.LogicalPlanGenerator.general_ > statement(LogicalPlanGenerator.java:1102) > > 36 at org.apache.pig.parser.LogicalPlanGenerator.statement( > LogicalPlanGenerator.java:560) > > 37 at org.apache.pig.parser.LogicalPlanGenerator.query( > LogicalPlanGenerator.java:421) > > 38 at org.apache.pig.parser.QueryParserDriver.parse( > QueryParserDriver.java:191) > > 39 ... 15 more > > 40 Caused by: org.apache.pig.backend.executionengine.ExecException: > ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, > java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin .] > > 41 at org.apache.pig.impl.PigContext.resolveClassName( > PigContext.java:677) > > 42 at org.apache.pig.impl.PigContext.getClassForAlias( > PigContext.java:793) > > 43 at org.apache.pig.parser.LogicalPlanBuilder.buildUDF( > LogicalPlanBuilder.java:1569) > > 44 ... 28 more > > 45 ========================= > > > On Wed, Nov 2, 2016 at 11:27 PM, Debabrata Pani <android.p...@gmail.com> > wrote: > >> Just to be doubly sure can you share the error inside the log file >> mentioned in the output ? >> >> On Nov 3, 2016 10:12, "mingda li" <limingda1...@gmail.com> wrote: >> >> > My query is as following: >> > >> > pig >> > -Dpig.additional.jars=/home/hadoop-user/pig-branch-0.lib/ >> > datafu-pig-incubating-1.3.1.jar >> > >> > >> > To open pig. >> > >> > Then, input: >> > >> > >> > *REGISTER* >> > /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-incubating-1.3.1.jar >> > >> > data = LOAD 'hdfs://SCAI01.CS.UCLA.EDU:9000/clash/datasets/1.txt' using >> > PigStorage() as (val:int); >> > >> > define MurmurH32 datafu.pig.hash.Hasher('murmur3-32'); >> > >> > dat= FOREACH data GENERATE MurmurH32(val); >> > >> > On Wed, Nov 2, 2016 at 9:35 PM, mingda li <limingda1...@gmail.com> >> wrote: >> > >> > > En, thanks Debabrata, but actually, I register each time ( forget to >> tell >> > > you) before i run the commands. >> > > I use *REGISTER* /home/hadoop-user/pig-branch-0.15/lib/datafu-pig- >> > > incubating-1.3.1.jar. >> > > But cannot help me. >> > > >> > > Any other reason? >> > > >> > > Thanks >> > > >> > > On Wed, Nov 2, 2016 at 8:03 PM, Debabrata Pani < >> android.p...@gmail.com> >> > > wrote: >> > > >> > >> It says that pig could not find the class Hasher. Start grunt with >> > >> -Dpig.additional.jars (before other pig arguments) or do a >> "register" of >> > >> individual jars before typing in your scripts. >> > >> >> > >> Regards, >> > >> Debabrata >> > >> >> > >> On Nov 3, 2016 07:09, "mingda li" <limingda1...@gmail.com> wrote: >> > >> >> > >> > Thanks. I have tried to install the datafu and finish quickstart >> > >> > successfully http://datafu.incubator.apache >> .org/docs/quick-start.html >> > >> > >> > >> > But when i use the murmur hash, it failed. I do not know why. >> > >> > >> > >> > grunt> data = LOAD 'hdfs://***.UCLA.EDU:9000/clash/datasets/1.txt >> ' >> > >> using >> > >> > PigStorage() as (val:int); >> > >> > >> > >> > grunt> data_out = FOREACH data GENERATE val; >> > >> > >> > >> > grunt> dat= FOREACH data GENERATE MurmurH32(val); >> > >> > >> > >> > 2016-11-02 18:25:18,424 [main] ERROR org.apache.pig.tools.grunt.Gru >> nt >> > - >> > >> > ERROR 1070: Could not resolve datafu.pig.hash.Hasher using >> imports: [, >> > >> > java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] >> > >> > >> > >> > Details at logfile: /home/hadoop-user/pig-branch- >> > >> > 0.15/bin/pig_1478136031217.log >> > >> > >> > >> > >> > >> > The log file is in attachment. >> > >> > >> > >> > >> > >> > Bests, >> > >> > >> > >> > Mingda >> > >> > >> > >> > >> > >> > On Wed, Nov 2, 2016 at 2:04 PM, Daniel Dai <da...@hortonworks.com> >> > >> wrote: >> > >> > >> > >> >> I see datafu has a patch for the UDF: >> https://issues.apache.org/jira >> > >> >> /browse/DATAFU-47 >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> On 11/2/16, 11:45 AM, "mingda li" <limingda1...@gmail.com> wrote: >> > >> >> >> > >> >> >Dear all, >> > >> >> > >> > >> >> >Hi, now I wants to import a UDF function to pig command. Has >> anyone >> > >> ever >> > >> >> >done so? I want to import google's guava/murmur3_32 to pig. Could >> > >> anyone >> > >> >> >give some useful materials or suggestion? >> > >> >> > >> > >> >> >Bests, >> > >> >> >Mingda >> > >> >> > >> > >> >> >On Wed, Nov 2, 2016 at 2:11 AM, mingda li < >> limingda1...@gmail.com> >> > >> >> wrote: >> > >> >> > >> > >> >> >> Yeah, I see. Thanks for your reply. >> > >> >> >> >> > >> >> >> Bests, >> > >> >> >> Mingda >> > >> >> >> >> > >> >> >> On Tue, Nov 1, 2016 at 9:20 PM, Daniel Dai < >> da...@hortonworks.com >> > > >> > >> >> wrote: >> > >> >> >> >> > >> >> >>> Yes, you need to dump/store xxx_OrderRes to kick off the job. >> You >> > >> will >> > >> >> >>> see two MapReduce jobs corresponding to the first and second >> > join. >> > >> >> >>> >> > >> >> >>> Thanks, >> > >> >> >>> Daniel >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> On 11/1/16, 10:52 AM, "mingda li" <limingda1...@gmail.com> >> > wrote: >> > >> >> >>> >> > >> >> >>> >Dear Dai, >> > >> >> >>> > >> > >> >> >>> >Thanks for your reply. >> > >> >> >>> >What I want to do is to compare the two different order of >> join. >> > >> The >> > >> >> >>> query >> > >> >> >>> >is as following: >> > >> >> >>> > >> > >> >> >>> >*Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales >> BY >> > >> >> >>> cs_item_sk;* >> > >> >> >>> >*Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, >> > >> cs_order_number), >> > >> >> >>> >catalog_returns BY (cr_item_sk, cr_order_number);* >> > >> >> >>> >*Dump or Store Bad_OrderRes;* >> > >> >> >>> > >> > >> >> >>> >*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk, >> > >> >> cr_order_number), >> > >> >> >>> >catalog_sales BY (cs_item_sk, cs_order_number);* >> > >> >> >>> >*Good_OrderRes = JOIN Good_OrderIn BY cs_item_sk, >> inventory BY >> > >> >> >>> > inv_item_sk;* >> > >> >> >>> >*Dump or Store Good_OrderRes;* >> > >> >> >>> > >> > >> >> >>> >Since Pig execute the query lazily, I think only by Dump or >> > Store >> > >> the >> > >> >> >>> >result, I can know the time of MapReduce Job, is it right? >> If it >> > >> is, >> > >> >> >>> then I >> > >> >> >>> >need to count the time to Dump or Store the result as the >> time >> > for >> > >> >> the >> > >> >> >>> >different orders' join. >> > >> >> >>> > >> > >> >> >>> >Bests, >> > >> >> >>> >Mingda >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> >On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai < >> > >> da...@hortonworks.com> >> > >> >> >>> wrote: >> > >> >> >>> > >> > >> >> >>> >> Hi, Mingda, >> > >> >> >>> >> >> > >> >> >>> >> Pig does not do join reordering and will execute the query >> as >> > >> the >> > >> >> way >> > >> >> >>> it >> > >> >> >>> >> is written. Note you can join multiple relations in one >> join >> > >> >> statement. >> > >> >> >>> >> >> > >> >> >>> >> Do you want execution time for each join in your >> statement? I >> > >> >> assume >> > >> >> >>> you >> > >> >> >>> >> are using regular join and running with MapReduce, every >> join >> > >> >> statement >> > >> >> >>> >> will be a separate MapReduce job and the join runtime is >> the >> > >> >> runtime >> > >> >> >>> for >> > >> >> >>> >> its MapReduce job. >> > >> >> >>> >> >> > >> >> >>> >> Thanks, >> > >> >> >>> >> Daniel >> > >> >> >>> >> >> > >> >> >>> >> >> > >> >> >>> >> >> > >> >> >>> >> On 10/31/16, 8:21 PM, "mingda li" <limingda1...@gmail.com> >> > >> wrote: >> > >> >> >>> >> >> > >> >> >>> >> >Dear all, >> > >> >> >>> >> > >> > >> >> >>> >> >I am doing optimization for multiple join. I am not sure >> if >> > Pig >> > >> >> can >> > >> >> >>> decide >> > >> >> >>> >> >the join order in optimization layer. Does anyone know >> about >> > >> >> this? Or >> > >> >> >>> Pig >> > >> >> >>> >> >just execute the query as the way it is written. >> > >> >> >>> >> > >> > >> >> >>> >> >And, I want to do the multiple way Join on different keys. >> > Can >> > >> the >> > >> >> >>> >> >following query work? >> > >> >> >>> >> > >> > >> >> >>> >> >Res = >> > >> >> >>> >> >JOIN >> > >> >> >>> >> >(JOIN catalog_sales BY cs_item_sk, inventory BY >> inv_item_sk) >> > >> BY >> > >> >> >>> >> >(cs_item_sk, cs_order_number), catalog_returns BY >> > (cr_item_sk, >> > >> >> >>> >> >cr_order_number); >> > >> >> >>> >> > >> > >> >> >>> >> >BTW, each time, I run the query, it is finished in one >> > second. >> > >> Is >> > >> >> >>> there a >> > >> >> >>> >> >way to see the execution time? I have set the >> > >> >> pig.udf.profile=true. >> > >> >> >>> Where >> > >> >> >>> >> >can I find the time? >> > >> >> >>> >> > >> > >> >> >>> >> >Bests, >> > >> >> >>> >> >Mingda >> > >> >> >>> >> >> > >> >> >>> >> > >> >> >> >> > >> >> >> >> > >> >> >> > >> > >> > >> > >> > >> >> > > >> > > >> > >> > >