[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006571#comment-17006571
]
Jason Guo commented on SPARK-24906:
---
[~lio...@taboola.com]
Yes, estimating with sampl
[
https://issues.apache.org/jira/browse/SPARK-29031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930230#comment-16930230
]
Jason Guo commented on SPARK-29031:
---
[~lishuming] `Materialized column` is supported i
[
https://issues.apache.org/jira/browse/SPARK-29031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-29031:
--
Description:
Goals
* Add a new SQL grammar of Materialized column
* Implicitly rewrite SQL queries o
Jason Guo created SPARK-29031:
-
Summary: Materialized column to accelerate queries
Key: SPARK-29031
URL: https://issues.apache.org/jira/browse/SPARK-29031
Project: Spark
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Shepherd: (was: Dongjoon Hyun)
> SkewJoin--handle only skewed keys with broadcastjoin and other keys
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Shepherd: Dongjoon Hyun (was: Liang-Chi Hsieh)
> SkewJoin--handle only skewed keys with broadcastjoin
[
https://issues.apache.org/jira/browse/SPARK-27865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27865:
--
Shepherd: Dongjoon Hyun
> Spark SQL support 1:N sort merge bucket join without shuffle
> -
[
https://issues.apache.org/jira/browse/SPARK-27865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27865:
--
Summary: Spark SQL support 1:N sort merge bucket join without shuffle
(was: Spark SQL support 1:N sor
Jason Guo created SPARK-27865:
-
Summary: Spark SQL support 1:N sort merge bucket join
Key: SPARK-27865
URL: https://issues.apache.org/jira/browse/SPARK-27865
Project: Spark
Issue Type: New Featur
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Description:
This feature is designed to handle data skew in Join
*Senario*
* A big table (big_sk
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Shepherd: Liang-Chi Hsieh
> SkewJoin--handle only skewed keys with broadcastjoin and other keys with
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Summary: SkewJoin--handle only skewed keys with broadcastjoin and other
keys with normal join (was: S
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Attachment: sql.png
> SkewJoin hint
> -
>
> Key: SPARK-27792
>
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Description:
This feature is designed to handle data skew in Join
*Senario*
* A big table (big_sk
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Description:
This feature is designed to handle data skew in Join
*Senario*
* A big table (big_sk
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Description:
This feature is designed to handle data skew in Join
*Senario*
* A big table (tableA
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Attachment: time.png
skew join DAG.png
> SkewJoin hint
> -
>
>
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Description:
This feature is designed to handle data skew in Join
*Senario*
* A big table (tableA
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Attachment: SMJ tasks.png
> SkewJoin hint
> -
>
> Key: SPARK-27792
>
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Attachment: SMJ DAG.png
> SkewJoin hint
> -
>
> Key: SPARK-27792
>
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Description:
This feature is designed to handle data skew in Join
*Senario*
* A big table (tableA
[
https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-27792:
--
Description:
This feature is designed to handle data skew in Join
*Senario*
* A big table (tableA
Jason Guo created SPARK-27792:
-
Summary: SkewJoin hint
Key: SPARK-27792
URL: https://issues.apache.org/jira/browse/SPARK-27792
Project: Spark
Issue Type: New Feature
Components: SQL
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571027#comment-16571027
]
Jason Guo commented on SPARK-25038:
---
[~hyukjin.kwon] Gotcha
I will create a PR for th
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Description:
When Spark SQL read large amount of data, it take a long time (more than 10
minutes) to
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: (was: job start original.png)
> Accelerate Spark Plan generation when Spark SQL read l
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: (was: issue sql original.png)
> Accelerate Spark Plan generation when Spark SQL read l
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: job start original.png
job start optimized.png
issue sql or
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: issue sql original.png
> Accelerate Spark Plan generation when Spark SQL read large amount
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Description:
When Spark SQL read large amount of data, it take a long time (more than 10
minutes) to
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: job start original.png
> Accelerate Spark Plan generation when Spark SQL read large amount
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: start.png
issue.png
> Accelerate Spark Plan generation when Spark SQL read
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: (was: start.png)
> Accelerate Spark Plan generation when Spark SQL read large amount o
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Attachment: (was: issue.png)
> Accelerate Spark Plan generation when Spark SQL read large amount o
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Description:
When Spark SQL read large amount of data, it take a long time (more than 10
minutes) to
[
https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-25038:
--
Description:
When Spark SQL read large amount of data, it take a long time (more than 10
minutes) to
Jason Guo created SPARK-25038:
-
Summary: Accelerate Spark Plan generation when Spark SQL read
large amount of data
Key: SPARK-25038
URL: https://issues.apache.org/jira/browse/SPARK-25038
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-24906:
--
Summary: Adaptively set split size for columnar file to ensure the task
read data size fit expectation
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556413#comment-16556413
]
Jason Guo commented on SPARK-24906:
---
[~maropu] [~viirya] What do you think about thi
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554972#comment-16554972
]
Jason Guo edited comment on SPARK-24906 at 7/25/18 6:09 AM:
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554972#comment-16554972
]
Jason Guo edited comment on SPARK-24906 at 7/25/18 1:03 AM:
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554972#comment-16554972
]
Jason Guo commented on SPARK-24906:
---
Thanks [~maropu] and [~viirya] for your comments.
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-24906:
--
Description:
For columnar file, such as, when spark sql read the table, each split will be
128 MB by
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-24906:
--
Attachment: image-2018-07-24-20-30-24-552.png
> Enlarge split size for columnar file to ensure the tas
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-24906:
--
Attachment: image-2018-07-24-20-29-24-797.png
> Enlarge split size for columnar file to ensure the tas
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-24906:
--
Attachment: image-2018-07-24-20-28-06-269.png
> Enlarge split size for columnar file to ensure the tas
[
https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Guo updated SPARK-24906:
--
Attachment: image-2018-07-24-20-26-32-441.png
> Enlarge split size for columnar file to ensure the tas
Jason Guo created SPARK-24906:
-
Summary: Enlarge split size for columnar file to ensure the task
read enough data
Key: SPARK-24906
URL: https://issues.apache.org/jira/browse/SPARK-24906
Project: Spark
48 matches
Mail list logo