Re: Re: join fail

Aman Sinha Wed, 11 May 2016 09:21:02 -0700

Hi,
good to know that the join succeeds with the hash distribute plan.  For the
broadcast join, I agree that the memory footprint is higher
than what it should be.   I thought we had a JIRA for this but I searched
and could not find one, so I have created it:
https://issues.apache.org/jira/browse/DRILL-4667.
Pls take a look at the explanation.
In the meantime, I would suggest 2 things:
(a) what is the setting for planner.width.max_per_node on your machine ?
Can you reduce it by half just for this query and try .
(b) for the pageurl if you know the max length, you might want to limit the
VARCHAR length, although I suppose you already created the Hive table
with String type (which gets converted to varchar(64K))


-Aman

On Tue, May 10, 2016 at 9:44 PM, lizhenm...@163.com <lizhenm...@163.com>
wrote:

> thank Andries and Aman to help  me.
>
> 1. when i  increase planner.memory.max_query_memory_per_node to 4G,8G, and
> it is not work.
> 2. As Aman said, after i run the analyze on the table, the rowcount of the
> scan is equal the true table, but the join fail yet.
> 3.The join is successful when use distribute plan.
> 3. i just want to know why the broadcast  join falied in the condition
> that the  size of the table that in  join right side is small than the size
> of cluster's total memory
>
>
>
> lizhenm...@163.com
>
> From: Aman Sinha
> Date: 2016-05-10 23:35
> To: user
> Subject: Re: join fail
> It's difficult to debug this type of issue over email thread.  However, 2
> observations:
> 1.  The following Scan which is the table that is broadcast shows a
> rowcount of 1.3M rows whereas your original email says the rowcount is 32M
> rows.  Are you sure Can you confirm what is the correct row count ?  Has
> ANALYZE been run on the Hive table ?
>
> > Scan(groupscan=[HiveScan [table=Table(dbName:default,
> > tableName:rankings_huge), columns=[`pageurl`, `pagerank`],
> numPartitions=0,
> > partitions= null,
> inputDirectories=[hdfs://nameservice1/HiBench/huge/Join/Input/rankings]]])
> > : rowType = RecordType(VARCHAR(65535) pageurl, INTEGER pagerank):
> *rowcount
> > = 1364388.0*
>
>
> 2. Why are you even trying to do a broadcast join for this query ?  Did the
> hash distribute join plan succeed or not ?
>
> -Aman
>
> On Tue, May 10, 2016 at 8:19 AM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
>
> > See if increasing planner.memory.max_query_memory_per_node helps. It is
> > set to 2G by default.
> >
> > Also see
> >
> > https://community.mapr.com/thread/10329 <
> > https://community.mapr.com/thread/10329>
> > https://community.mapr.com/docs/DOC-1497 <
> > https://community.mapr.com/docs/DOC-1497>
> >
> >
> > --Andries
> >
> >
> > > On May 10, 2016, at 12:06 AM, leezy <lizhenm...@163.com> wrote:
> > >
> > >
> > >
> > > I am sorry the picture cannot be read. here is the text:
> > >
> > >
> > >
> > >
> > > | Minor Fragment ID | Host Name | Start | End | Runtime | Max Records |
> > Max Batches | Last Update | Last Progress | Peak Memory | State |
> > > | 05-00-xx | bigdata2 | 1.529s | 24m46s | 24m45s | 71,555,857 | 21,985
> |
> > 12:16:02 | 12:16:02 | 7.68GB | CANCELLATION_REQUESTED |
> > > | 05-01-xx | bigdata4 | 1.566s | 24m47s | 24m45s | 71,567,860 | 21,990
> |
> > 12:16:02 | 12:16:02 | 7.69GB | CANCELLATION_REQUESTED |
> > > | 05-02-xx | bigdata3 | 1.526s | 22m15s | 22m13s | 71,551,338 | 21,981
> |
> > 12:13:30 | 12:13:25 | 7.68GB | RUNNING |
> > > | 05-03-xx | bigdata2 | 1.530s | 25m16s | 25m15s | 71,565,965 | 21,982
> |
> > 12:16:32 | 12:16:32 | 7.70GB | CANCELLATION_REQUESTED |
> > > | 05-04-xx | bigdata4 | 1.567s | 25m17s | 25m15s | 71,556,216 | 21,988
> |
> > 12:16:32 | 12:16:32 | 7.70GB | CANCELLATION_REQUESTED |
> > > | 05-05-xx | bigdata3 | 1.527s | 22m15s | 22m13s | 71,550,821 | 21,983
> |
> > 12:13:30 | 12:13:25 | 7.68GB | RUNNING |
> > > | 05-06-xx | bigdata2 | 1.531s | 25m46s | 25m45s | 71,558,873 | 21,983
> |
> > 12:17:02 | 12:17:02 | 7.70GB | CANCELLATION_REQUESTED |
> > > | 05-07-xx | bigdata4 | 1.568s | 25m47s | 25m45s | 71,558,657 | 21,982
> |
> > 12:17:02 | 12:17:02 | 7.67GB | CANCELLATION_REQUESTED |
> > > | 05-08-xx | bigdata3 | 1.528s | 22m15s | 22m13s | 71,558,109 | 21,989
> |
> > 12:13:30 | 12:13:25 | 7.69GB | RUNNING |
> > > | 05-09-xx | bigdata2 | 1.532s | 26m17s | 26m15s | 71,558,226 | 21,983
> |
> > 12:17:32 | 12:17:32 | 7.68GB | CANCELLATION_REQUESTED |
> > > | 05-10-xx | bigdata4 | 1.568s | 26m17s | 26m15s | 71,558,359 | 21,980
> |
> > 12:17:32 | 12:17:32 | 7.68GB | CANCELLATION_REQUESTED |
> > > | 05-11-xx | bigdata3 | 1.529s | 22m15s | 22m13s | 71,537,409 | 21,975
> |
> > 12:13:30 | 12:13:30 | 7.69GB | RUNNING |
> > > | 05-12-xx | bigdata2 | 1.533s | 26m47s | 26m45s | 71,037,058 | 21,800
> |
> > 12:18:02 | 12:18:02 | 7.63GB | CANCELLATION_REQUESTED |
> > > | 05-13-xx | bigdata4 | 1.569s | 26m47s | 26m45s | 71,040,788 | 21,805
> |
> > 12:18:02 | 12:18:02 | 7.64GB | CANCELLATION_REQUESTED |
> > > | 05-14-xx | bigdata3 | 1.529s | 22m15s | 22m13s | 71,001,401 | 21,791
> |
> > 12:13:30 | 12:13:25 | 7.65GB | RUNNING |
> > > | 05-15-xx | bigdata2 | 1.533s | 27m17s | 27m15s | 71,007,623 | 21,793
> |
> > 12:18:32 | 12:18:32 | 7.62GB | CANCELLATION_REQUESTED |
> > > | 05-16-xx | bigdata4 | 1.570s | 27m17s | 27m15s | 71,029,560 | 21,793
> |
> > 12:18:32 | 12:18:32 | 7.63GB | CANCELLATION_REQUESTED |
> > > | 05-17-xx | bigdata3 | 1.530s | 22m15s | 22m13s | 71,057,938 | 21,806
> |
> > 12:13:30 | 12:13:30 | 7.64GB | RUNNING |
> > >
> > >
> > >
> > > --
> > >
> > >
> > > thanks for your regards.
> > >
> > >
> > > At 2016-05-10 15:01:14, "leezy" <lizhenm...@163.com> wrote:
> > >
> > > here is the failure profiles. And i see the memory is not exceed 55G in
> > each node.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > thanks for your regards.
> > >
> > >
> > >
> > >
> > > At 2016-05-10 14:47:10, "leezy" <lizhenm...@163.com> wrote:
> > >> Leon,thank you for your reply, yes, i have set the driver memory to
> 55G
> > and the java heap memory to 8G. And this user case is run successfully in
> > the impala that installed on the same cluster. But in drill , the cmd
> > always show the follow errors:
> > >> Error: RESOURCE ERROR: One or more nodes ran out of memory while
> > executing the query.
> > >>
> > >> Failure allocating buffer.
> > >> Fragment 3:6
> > >>
> > >> [Error Id: d623b6b2-279d-4d24-af4d-0b62554b440c on bigdata2:31010]
> > (state=,code=0)
> > >>
> > >>
> > >> i think i miss some configuration.
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >>
> > >> thanks for your regards.
> > >>
> > >>
> > >>
> > >>
> > >> At 2016-05-10 13:51:12, "Leon Clayton" <lclay...@maprtech.com> wrote:
> > >>> did you increase the memory setting for Drill from the default?
> > >>>
> > >>> https://drill.apache.org/docs/configuring-drill-memory/ <
> > https://drill.apache.org/docs/configuring-drill-memory/>
> > >>>
> > >>>
> > >>>> On 10 May 2016, at 02:25, lizhenm...@163.com wrote:
> > >>>>
> > >>>>
> > >>>> hi:
> > >>>> i run join operation in the drill, i use broadcast and put the small
> > table in the right. The small table has 32000000 rows. I have set the
> > planner.broadcast_threshold to 100000000. The cluster has three nodes and
> > every node has 64G memory. when join is running, the memory is increasing
> > untill the driilbit process exit. But the same query is run successful in
> > the impala and they are in the same cluster.
> > >>>> here is the query plan.
> > >>>>
> > >>>> 00-00    Screen : rowType = RecordType(VARCHAR(65535) sourceIP,
> > DOUBLE totalRevenue, ANY avgPageRank): rowcount = 1.0, cumulative cost =
> > {5.294005617599999E8 rows, 6.356723058846001E10 cpu, 0.0 io,
> > 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5015
> > >>>> 00-01      Project(sourceIP=[$0], totalRevenue=[$1],
> > avgPageRank=[$2]) : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE
> > totalRevenue, ANY avgPageRank): rowcount = 1.0, cumulative cost =
> > {5.2940056165999985E8 rows, 6.356723058836001E10 cpu, 0.0 io,
> > 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5014
> > >>>> 00-02        SelectionVectorRemover : rowType =
> > RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY
> avgPageRank):
> > rowcount = 1.0, cumulative cost = {5.2940056165999985E8 rows,
> > 6.356723058836001E10 cpu, 0.0 io, 1.4803953770495996E11 network,
> > 9.1066982688E8 memory}, id = 5013
> > >>>> 00-03          Limit(fetch=[1]) : rowType =
> RecordType(VARCHAR(65535)
> > sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 1.0,
> cumulative
> > cost = {5.2940056065999985E8 rows, 6.356723058736001E10 cpu, 0.0 io,
> > 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5012
> > >>>> 00-04            SingleMergeExchange(sort0=[1 DESC]) : rowType =
> > RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY
> avgPageRank):
> > rowcount = 457983.77999999997, cumulative cost = {5.2940055965999985E8
> > rows, 6.356723058336001E10 cpu, 0.0 io, 1.4803953770495996E11 network,
> > 9.1066982688E8 memory}, id = 5011
> > >>>> 01-01              SelectionVectorRemover : rowType =
> > RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY
> avgPageRank):
> > rowcount = 457983.77999999997, cumulative cost = {5.289425758799999E8
> rows,
> > 6.356173477800001E10 cpu, 0.0 io, 1.4241183301631998E11 network,
> > 9.1066982688E8 memory}, id = 5010
> > >>>> 01-02                TopN(limit=[1]) : rowType =
> > RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY
> avgPageRank):
> > rowcount = 457983.77999999997, cumulative cost = {5.284845920999999E8
> rows,
> > 6.356127679422001E10 cpu, 0.0 io, 1.4241183301631998E11 network,
> > 9.1066982688E8 memory}, id = 5009
> > >>>> 01-03                  Project(sourceIP=[$0], totalRevenue=[$1],
> > avgPageRank=[$2]) : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE
> > totalRevenue, ANY avgPageRank): rowcount = 457983.77999999997, cumulative
> > cost = {5.2802660831999993E8 rows, 6.356127679422001E10 cpu, 0.0 io,
> > 1.4241183301631998E11 network, 9.1066982688E8 memory}, id = 5008
> > >>>> 01-04                    HashToRandomExchange(dist0=[[$1]]) :
> rowType
> > = RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY
> avgPageRank,
> > ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.77999999997, cumulative
> > cost = {5.2802660831999993E8 rows, 6.356127679422001E10 cpu, 0.0 io,
> > 1.4241183301631998E11 network, 9.1066982688E8 memory}, id = 5007
> > >>>> 02-01                      UnorderedMuxExchange : rowType =
> > RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank,
> > ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.77999999997, cumulative
> > cost = {5.2756862453999996E8 rows, 6.3553949053740005E10 cpu, 0.0 io,
> > 1.3490822676479999E11 network, 9.1066982688E8 memory}, id = 5006
> > >>>> 03-01                        Project(sourceIP=[$0],
> > totalRevenue=[$1], avgPageRank=[$2],
> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType =
> > RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank,
> > ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.77999999997, cumulative
> > cost = {5.2711064076E8 rows, 6.355349106996001E10 cpu, 0.0 io,
> > 1.3490822676479999E11 network, 9.1066982688E8 memory}, id = 5005
> > >>>> 03-02                          Project(sourceIP=[$0],
> > totalRevenue=[CASE(=($4, 0), null, $3)],
> > avgPageRank=[CAST(/(CastHigh(CASE(=($2, 0), null, $1)), $2)):ANY NOT
> NULL])
> > : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY
> > avgPageRank): rowcount = 457983.77999999997, cumulative cost =
> > {5.2665265698E8 rows, 6.3551659134840004E10 cpu, 0.0 io,
> > 1.3490822676479999E11 network, 9.1066982688E8 memory}, id = 5004
> > >>>> 03-03                            HashAgg(group=[{0}],
> > agg#0=[$SUM0($1)], agg#1=[$SUM0($2)], agg#2=[$SUM0($3)],
> agg#3=[$SUM0($4)])
> > : rowType = RecordType(VARCHAR(65535) sourceIP, INTEGER $f1, BIGINT $f2,
> > DOUBLE $f3, BIGINT $f4): rowcount = 457983.77999999997, cumulative cost =
> > {5.2619467320000005E8 rows, 6.3547995264600006E10 cpu, 0.0 io,
> > 1.3490822676479999E11 network, 9.1066982688E8 memory}, id = 5003
> > >>>> 03-04                              Project(sourceIP=[$0], $f1=[$1],
> > $f2=[$2], $f3=[$3], $f4=[$4]) : rowType = RecordType(VARCHAR(65535)
> > sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE $f3, BIGINT $f4): rowcount =
> > 4579837.8, cumulative cost = {5.2161483540000004E8 rows, 6.32915243478E10
> > cpu, 0.0 io, 1.3490822676479999E11 network, 8.300646816E8 memory}, id =
> 5002
> > >>>> 03-05
> > HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(VARCHAR(65535)
> > sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE $f3, BIGINT $f4, ANY
> > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 4579837.8, cumulative cost =
> > {5.2161483540000004E8 rows, 6.32915243478E10 cpu, 0.0 io,
> > 1.3490822676479999E11 network, 8.300646816E8 memory}, id = 5001
> > >>>> 04-01                                  UnorderedMuxExchange :
> rowType
> > = RecordType(VARCHAR(65535) sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE
> $f3,
> > BIGINT $f4, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 4579837.8,
> > cumulative cost = {5.170349976E8 rows, 6.3218246943E10 cpu, 0.0 io,
> > 2.2354132992E10 network, 8.300646816E8 memory}, id = 5000
> > >>>> 05-01                                    Project(sourceIP=[$0],
> > $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4],
> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) : rowType =
> > RecordType(VARCHAR(65535) sourceIP, INTEGER $f1, BIGINT $f2, DOUBLE $f3,
> > BIGINT $f4, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 4579837.8,
> > cumulative cost = {5.124551598E8 rows, 6.32136671052E10 cpu, 0.0 io,
> > 2.2354132992E10 network, 8.300646816E8 memory}, id = 4999
> > >>>> 05-02                                      HashAgg(group=[{0}],
> > agg#0=[$SUM0($1)], agg#1=[COUNT($1)], agg#2=[$SUM0($2)],
> agg#3=[COUNT($2)])
> > : rowType = RecordType(VARCHAR(65535) sourceIP, INTEGER $f1, BIGINT $f2,
> > DOUBLE $f3, BIGINT $f4): rowcount = 4579837.8, cumulative cost =
> > {5.07875322E8 rows, 6.3195347754E10 cpu, 0.0 io, 2.2354132992E10 network,
> > 8.300646816E8 memory}, id = 4998
> > >>>> 05-03                                        Project(sourceIP=[$0],
> > pagerank=[$5], adrevenue=[$3]) : rowType = RecordType(VARCHAR(65535)
> > sourceIP, INTEGER pagerank, DOUBLE adrevenue): rowcount = 4.5798378E7,
> > cumulative cost = {4.62076944E8 rows, 6.0630638586E10 cpu, 0.0 io,
> > 2.2354132992E10 network, 2.40132288E7 memory}, id = 4997
> > >>>> 05-04
> > HashJoin(condition=[=($4, $1)], joinType=[inner]) : rowType =
> > RecordType(VARCHAR(65535) sourceip, VARCHAR(65535) desturl,
> VARCHAR(65535)
> > visitdate, DOUBLE adrevenue, VARCHAR(65535) pageurl, INTEGER pagerank):
> > rowcount = 4.5798378E7, cumulative cost = {4.62076944E8 rows,
> > 6.0630638586E10 cpu, 0.0 io, 2.2354132992E10 network, 2.40132288E7
> memory},
> > id = 4996
> > >>>> 05-06
> > SelectionVectorRemover : rowType = RecordType(VARCHAR(65535) sourceip,
> > VARCHAR(65535) desturl, VARCHAR(65535) visitdate, DOUBLE adrevenue):
> > rowcount = 4.5798378E7, cumulative cost = {4.12185402E8 rows,
> > 5.9950076802E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4993
> > >>>> 05-07
> > Filter(condition=[AND(>=(DATEDIFF($2, '1980-01-01'), 0), <=(DATEDIFF($2,
> > '1980-04-01'), 0))]) : rowType = RecordType(VARCHAR(65535) sourceip,
> > VARCHAR(65535) desturl, VARCHAR(65535) visitdate, DOUBLE adrevenue):
> > rowcount = 4.5798378E7, cumulative cost = {3.66387024E8 rows,
> > 5.9904278424E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4992
> > >>>> 05-08
> > Scan(groupscan=[HiveScan [table=Table(dbName:default,
> > tableName:uservisits_copy_huge), columns=[`sourceip`, `desturl`,
> > `visitdate`, `adrevenue`], numPartitions=0, partitions= null,
> >
> inputDirectories=[hdfs://nameservice1/HiBench/huge/Join/Input/uservisits]]])
> > : rowType = RecordType(VARCHAR(65535) sourceip, VARCHAR(65535) desturl,
> > VARCHAR(65535) visitdate, DOUBLE adrevenue): rowcount = 1.83193512E8,
> > cumulative cost = {1.83193512E8 rows, 5.862192384E10 cpu, 0.0 io, 0.0
> > network, 0.0 memory}, id = 4991
> > >>>> 05-05                                            BroadcastExchange :
> > rowType = RecordType(VARCHAR(65535) pageurl, INTEGER pagerank): rowcount
> =
> > 1364388.0, cumulative cost = {2728776.0 rows, 1.20066144E8 cpu, 0.0 io,
> > 2.2354132992E10 network, 0.0 memory}, id = 4995
> > >>>> 06-01
> > Scan(groupscan=[HiveScan [table=Table(dbName:default,
> > tableName:rankings_huge), columns=[`pageurl`, `pagerank`],
> numPartitions=0,
> > partitions= null,
> >
> inputDirectories=[hdfs://nameservice1/HiBench/huge/Join/Input/rankings]]])
> > : rowType = RecordType(VARCHAR(65535) pageurl, INTEGER pagerank):
> rowcount
> > = 1364388.0, cumulative cost = {1364388.0 rows, 1.0915104E8 cpu, 0.0 io,
> > 0.0 network, 0.0 memory}, id = 4994
> > >>>> --
> > >>>>
> > >>>> thanks for your regards.
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >
> > >
> > >
> > >
> > >
> >
> >
>

Re: Re: join fail

Reply via email to