Hi Olivier,
Here it is.
== Physical Plan ==
Aggregate false, [PartialGroup#155], [PartialGroup#155 AS
is_bad#108,Coalesce(SUM(PartialCount#152L),0) AS
count#109L,(CAST(SUM(PartialSum#153), DoubleType) /
CAST(SUM(PartialCount#154L), DoubleType)) AS avg#110]
Exchange (HashPartitioning [PartialGroup#155], 200), []
Aggregate true, [meta#143[is_bad]], [meta#143[is_bad] AS
PartialGroup#155,COUNT(1) AS PartialCount#152L,COUNT(nvar#145[var1]) AS
PartialCount#154L,SUM(nvar#145[var1]) AS PartialSum#153]
Project [meta#143,nvar#145]
Filter ((date#147 >= 2014-04-01) && (date#147 <= 2014-04-30))
PhysicalRDD [meta#143,nvar#145,date#147], MapPartitionsRDD[6] at
explain at <console>:32
Jianshi
On Tue, May 12, 2015 at 10:34 PM, Olivier Girardot <[email protected]>
wrote:
> can you post the explain too ?
>
> Le mar. 12 mai 2015 à 12:11, Jianshi Huang <[email protected]> a
> écrit :
>
>> Hi,
>>
>> I have a SQL query on tables containing big Map columns (thousands of
>> keys). I found it to be very slow.
>>
>> select meta['is_bad'] as is_bad, count(*) as count, avg(nvar['var1']) as
>> avg
>> from test
>> where date between '2014-04-01' and '2014-04-30'
>> group by meta['is_bad']
>>
>> =>
>>
>> +---------+-----------+-----------------------+
>> | is_bad | count | avg |
>> +---------+-----------+-----------------------+
>> | 0 | 17024396 | 0.16257395850742645 |
>> | 1 | 179729 | -0.37626256661125485 |
>> | 2 | 28128 | 0.11674427263203344 |
>> | 3 | 116327 | -0.6398689187187386 |
>> | 4 | 87715 | -0.5349632960030563 |
>> | 5 | 169771 | 0.40812641191854626 |
>> | 6 | 542447 | 0.5238256418341465 |
>> | 7 | 160324 | 0.29442847034840386 |
>> | 8 | 2099 | -0.9165701665162977 |
>> | 9 | 3104 | 0.3845685004598235 |
>> +---------+-----------+-----------------------+
>> 10 rows selected (130.5 seconds)
>>
>>
>> The total number of rows is less than 20M. Why so slow?
>>
>> I'm running on Spark 1.4.0-SNAPSHOT with 100 executors each having 4GB
>> ram and 2 CPU core.
>>
>> Looks like https://issues.apache.org/jira/browse/SPARK-5446 is still
>> open, when can we have it fixed? :)
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/