Re: About Multiple Join in Pig

mingda li Thu, 03 Nov 2016 18:12:43 -0700

Anyone have idea about the problem? I still cannot solve it.

On Wed, Nov 2, 2016 at 11:33 PM, mingda li <[email protected]> wrote:


> Yeah, the log file's content is as following:
>
>   1 Pig Stack Trace
>
>   2 ---------------
>
>   3 ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports:
> [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
>   4
>
>   5 Failed to parse: Pig script failed to parse:
>
>   6 <line 3, column 27> Failed to generate logical plan. Nested
> exception: org.apache.pig.backend.executionengine.ExecException: ERROR
> 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java    
> .lang.,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
>   7         at org.apache.pig.parser.QueryParserDriver.parse(
> QueryParserDriver.java:199)
>
>   8         at org.apache.pig.PigServer$Graph.validateQuery(PigServer.
> java:1707)
>
>   9         at org.apache.pig.PigServer$Graph.registerQuery(PigServer.
> java:1680)
>
>  10         at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
>
>  11         at org.apache.pig.tools.grunt.GruntParser.processPig(
> GruntParser.java:1082)
>
>  12         at org.apache.pig.tools.pigscript.parser.
> PigScriptParser.parse(PigScriptParser.java:505)
>
>  13         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(
> GruntParser.java:230)
>
>  14         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(
> GruntParser.java:205)
>
>  15         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
>
>  16         at org.apache.pig.Main.run(Main.java:565)
>
>  17         at org.apache.pig.Main.main(Main.java:177)
>
>  18         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>  19         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>
>  20         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
>  21         at java.lang.reflect.Method.invoke(Method.java:606)
>
>  22         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>  23 Caused by:
>
>  24 <line 3, column 27> Failed to generate logical plan. Nested
> exception: org.apache.pig.backend.executionengine.ExecException: ERROR
> 1070: Could not resolve datafu.pig.hash.Hasher using imports: [, java    
> .lang.,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
>  25         at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(
> LogicalPlanBuilder.java:1572)
>
>  26         at org.apache.pig.parser.LogicalPlanGenerator.func_
> eval(LogicalPlanGenerator.java:9403)
>
>  27         at org.apache.pig.parser.LogicalPlanGenerator.
> projectable_expr(LogicalPlanGenerator.java:11082)
>
>  28         at org.apache.pig.parser.LogicalPlanGenerator.var_expr(
> LogicalPlanGenerator.java:10841)
>
>  29         at org.apache.pig.parser.LogicalPlanGenerator.expr(
> LogicalPlanGenerator.java:10190)
>
>  30         at org.apache.pig.parser.LogicalPlanGenerator.flatten_
> generated_item(LogicalPlanGenerator.java:7519)
>
>  31         at org.apache.pig.parser.LogicalPlanGenerator.generate_
> clause(LogicalPlanGenerator.java:17621)
>
>  32         at org.apache.pig.parser.LogicalPlanGenerator.foreach_
> plan(LogicalPlanGenerator.java:16013)
>
>  33         at org.apache.pig.parser.LogicalPlanGenerator.foreach_
> clause(LogicalPlanGenerator.java:15880)
>
>  34         at org.apache.pig.parser.LogicalPlanGenerator.op_
> clause(LogicalPlanGenerator.java:1933)
>
>  35         at org.apache.pig.parser.LogicalPlanGenerator.general_
> statement(LogicalPlanGenerator.java:1102)
>
>  36         at org.apache.pig.parser.LogicalPlanGenerator.statement(
> LogicalPlanGenerator.java:560)
>
>  37         at org.apache.pig.parser.LogicalPlanGenerator.query(
> LogicalPlanGenerator.java:421)
>
>  38         at org.apache.pig.parser.QueryParserDriver.parse(
> QueryParserDriver.java:191)
>
>  39         ... 15 more
>
>  40 Caused by: org.apache.pig.backend.executionengine.ExecException:
> ERROR 1070: Could not resolve datafu.pig.hash.Hasher using imports: [,
> java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin    .]
>
>  41         at org.apache.pig.impl.PigContext.resolveClassName(
> PigContext.java:677)
>
>  42         at org.apache.pig.impl.PigContext.getClassForAlias(
> PigContext.java:793)
>
>  43         at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(
> LogicalPlanBuilder.java:1569)
>
>  44         ... 28 more
>
>  45 =========================
>
>
> On Wed, Nov 2, 2016 at 11:27 PM, Debabrata Pani <[email protected]>
> wrote:
>
>> Just to be doubly sure can you share the error inside the log file
>> mentioned in the output ?
>>
>> On Nov 3, 2016 10:12, "mingda li" <[email protected]> wrote:
>>
>> > My query is as following:
>> >
>> > pig
>> > -Dpig.additional.jars=/home/hadoop-user/pig-branch-0.lib/
>> > datafu-pig-incubating-1.3.1.jar
>> >
>> >
>> > To open pig.
>> >
>> > Then, input:
>> >
>> >
>> > *REGISTER*
>> > /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-incubating-1.3.1.jar
>> >
>> > data = LOAD 'hdfs://SCAI01.CS.UCLA.EDU:9000/clash/datasets/1.txt' using
>> > PigStorage() as (val:int);
>> >
>> > define MurmurH32   datafu.pig.hash.Hasher('murmur3-32');
>> >
>> > dat= FOREACH data GENERATE MurmurH32(val);
>> >
>> > On Wed, Nov 2, 2016 at 9:35 PM, mingda li <[email protected]>
>> wrote:
>> >
>> > > En, thanks Debabrata, but actually, I register each time ( forget to
>> tell
>> > > you) before i run the commands.
>> > > I use *REGISTER* /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-
>> > > incubating-1.3.1.jar.
>> > > But cannot help me.
>> > >
>> > > Any other reason?
>> > >
>> > > Thanks
>> > >
>> > > On Wed, Nov 2, 2016 at 8:03 PM, Debabrata Pani <
>> [email protected]>
>> > > wrote:
>> > >
>> > >> It says that pig could not find the class Hasher. Start grunt with
>> > >> -Dpig.additional.jars (before other pig arguments) or do a
>> "register" of
>> > >> individual jars before typing in your scripts.
>> > >>
>> > >> Regards,
>> > >> Debabrata
>> > >>
>> > >> On Nov 3, 2016 07:09, "mingda li" <[email protected]> wrote:
>> > >>
>> > >> > Thanks. I have tried to install the datafu and finish quickstart
>> > >> > successfully http://datafu.incubator.apache
>> .org/docs/quick-start.html
>> > >> >
>> > >> > But when i use the murmur hash, it failed. I do not know why.
>> > >> >
>> > >> > grunt>  data = LOAD 'hdfs://***.UCLA.EDU:9000/clash/datasets/1.txt
>> '
>> > >> using
>> > >> > PigStorage() as (val:int);
>> > >> >
>> > >> > grunt> data_out = FOREACH data GENERATE val;
>> > >> >
>> > >> > grunt> dat= FOREACH data GENERATE MurmurH32(val);
>> > >> >
>> > >> > 2016-11-02 18:25:18,424 [main] ERROR org.apache.pig.tools.grunt.Gru
>> nt
>> > -
>> > >> > ERROR 1070: Could not resolve datafu.pig.hash.Hasher using
>> imports: [,
>> > >> > java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>> > >> >
>> > >> > Details at logfile: /home/hadoop-user/pig-branch-
>> > >> > 0.15/bin/pig_1478136031217.log
>> > >> >
>> > >> >
>> > >> > The log file is in attachment.
>> > >> >
>> > >> >
>> > >> > Bests,
>> > >> >
>> > >> > Mingda
>> > >> >
>> > >> >
>> > >> > On Wed, Nov 2, 2016 at 2:04 PM, Daniel Dai <[email protected]>
>> > >> wrote:
>> > >> >
>> > >> >> I see datafu has a patch for the UDF:
>> https://issues.apache.org/jira
>> > >> >> /browse/DATAFU-47
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On 11/2/16, 11:45 AM, "mingda li" <[email protected]> wrote:
>> > >> >>
>> > >> >> >Dear all,
>> > >> >> >
>> > >> >> >Hi, now I wants to import a UDF function to pig command. Has
>> anyone
>> > >> ever
>> > >> >> >done so? I want to import google's guava/murmur3_32 to pig. Could
>> > >> anyone
>> > >> >> >give some useful materials or suggestion？
>> > >> >> >
>> > >> >> >Bests,
>> > >> >> >Mingda
>> > >> >> >
>> > >> >> >On Wed, Nov 2, 2016 at 2:11 AM, mingda li <
>> [email protected]>
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Yeah, I see. Thanks for your reply.
>> > >> >> >>
>> > >> >> >> Bests,
>> > >> >> >> Mingda
>> > >> >> >>
>> > >> >> >> On Tue, Nov 1, 2016 at 9:20 PM, Daniel Dai <
>> [email protected]
>> > >
>> > >> >> wrote:
>> > >> >> >>
>> > >> >> >>> Yes, you need to dump/store xxx_OrderRes to kick off the job.
>> You
>> > >> will
>> > >> >> >>> see two MapReduce jobs corresponding to the first and second
>> > join.
>> > >> >> >>>
>> > >> >> >>> Thanks,
>> > >> >> >>> Daniel
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>>
>> > >> >> >>> On 11/1/16, 10:52 AM, "mingda li" <[email protected]>
>> > wrote:
>> > >> >> >>>
>> > >> >> >>> >Dear Dai,
>> > >> >> >>> >
>> > >> >> >>> >Thanks for your reply.
>> > >> >> >>> >What I want to do is to compare the two different order of
>> join.
>> > >> The
>> > >> >> >>> query
>> > >> >> >>> >is as following:
>> > >> >> >>> >
>> > >> >> >>> >*Bad_OrderIn = JOIN inventory BY  inv_item_sk, catalog_sales
>> BY
>> > >> >> >>> cs_item_sk;*
>> > >> >> >>> >*Bad_OrderRes = JOIN Bad_OrderIn  BY   (cs_item_sk,
>> > >> cs_order_number),
>> > >> >> >>> >catalog_returns BY (cr_item_sk, cr_order_number);*
>> > >> >> >>> >*Dump or Store Bad_OrderRes;*
>> > >> >> >>> >
>> > >> >> >>> >*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk,
>> > >> >> cr_order_number),
>> > >> >> >>> >catalog_sales BY (cs_item_sk, cs_order_number);*
>> > >> >> >>> >*Good_OrderRes = JOIN Good_OrderIn  BY  cs_item_sk,
>> inventory BY
>> > >> >> >>> > inv_item_sk;*
>> > >> >> >>> >*Dump or Store Good_OrderRes;*
>> > >> >> >>> >
>> > >> >> >>> >Since Pig execute the query lazily, I think only by Dump or
>> > Store
>> > >> the
>> > >> >> >>> >result, I can know the time of MapReduce Job, is it right?
>> If it
>> > >> is,
>> > >> >> >>> then I
>> > >> >> >>> >need to count the time to Dump or Store the result as the
>> time
>> > for
>> > >> >> the
>> > >> >> >>> >different orders' join.
>> > >> >> >>> >
>> > >> >> >>> >Bests,
>> > >> >> >>> >Mingda
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >> >>> >
>> > >> >> >>> >On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai <
>> > >> [email protected]>
>> > >> >> >>> wrote:
>> > >> >> >>> >
>> > >> >> >>> >> Hi, Mingda,
>> > >> >> >>> >>
>> > >> >> >>> >> Pig does not do join reordering and will execute the query
>> as
>> > >> the
>> > >> >> way
>> > >> >> >>> it
>> > >> >> >>> >> is written. Note you can join multiple relations in one
>> join
>> > >> >> statement.
>> > >> >> >>> >>
>> > >> >> >>> >> Do you want execution time for each join in your
>> statement? I
>> > >> >> assume
>> > >> >> >>> you
>> > >> >> >>> >> are using regular join and running with MapReduce, every
>> join
>> > >> >> statement
>> > >> >> >>> >> will be a separate MapReduce job and the join runtime is
>> the
>> > >> >> runtime
>> > >> >> >>> for
>> > >> >> >>> >> its MapReduce job.
>> > >> >> >>> >>
>> > >> >> >>> >> Thanks,
>> > >> >> >>> >> Daniel
>> > >> >> >>> >>
>> > >> >> >>> >>
>> > >> >> >>> >>
>> > >> >> >>> >> On 10/31/16, 8:21 PM, "mingda li" <[email protected]>
>> > >> wrote:
>> > >> >> >>> >>
>> > >> >> >>> >> >Dear all,
>> > >> >> >>> >> >
>> > >> >> >>> >> >I am doing optimization for multiple join. I am not sure
>> if
>> > Pig
>> > >> >> can
>> > >> >> >>> decide
>> > >> >> >>> >> >the join order in optimization layer. Does anyone know
>> about
>> > >> >> this? Or
>> > >> >> >>> Pig
>> > >> >> >>> >> >just execute the query as the way it is written.
>> > >> >> >>> >> >
>> > >> >> >>> >> >And, I want to do the multiple way Join on different keys.
>> > Can
>> > >> the
>> > >> >> >>> >> >following query work?
>> > >> >> >>> >> >
>> > >> >> >>> >> >Res =
>> > >> >> >>> >> >JOIN
>> > >> >> >>> >> >(JOIN catalog_sales BY cs_item_sk, inventory BY
>> inv_item_sk)
>> > >> BY
>> > >> >> >>> >> >(cs_item_sk, cs_order_number), catalog_returns BY
>> > (cr_item_sk,
>> > >> >> >>> >> >cr_order_number);
>> > >> >> >>> >> >
>> > >> >> >>> >> >BTW, each time, I run the query, it is finished in one
>> > second.
>> > >> Is
>> > >> >> >>> there a
>> > >> >> >>> >> >way to see the execution time? I have set the
>> > >> >> pig.udf.profile=true.
>> > >> >> >>> Where
>> > >> >> >>> >> >can I find the time?
>> > >> >> >>> >> >
>> > >> >> >>> >> >Bests,
>> > >> >> >>> >> >Mingda
>> > >> >> >>> >>
>> > >> >> >>>
>> > >> >> >>
>> > >> >> >>
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: About Multiple Join in Pig

Reply via email to