[GitHub] [sedona] cgqfzy opened a new issue, #821: How to define geometry data types such as polygon and point Spark SQL when using sedona

2023-04-19 Thread via GitHub
cgqfzy opened a new issue, #821: URL: https://github.com/apache/sedona/issues/821 ### 1. start spark session local ```shell bin/spark-sql --jars /mypath/sedona-spark-shaded-3.0_2.12-1.4.0.jar,/mypath/sedona-viz-3.0_2.12-1.4.0.jar,/mypath/sedona-sql-3.0_2.12-1.4.0.jar,/mypath/sedona-com

Re: Sedona workers stuck in loop: DynamicIndexLookupJudgement: [xx, PID=xx] [Streaming shapes] Reached a milestone: xxxxxxx

2023-04-19 Thread Jia Yu
On Wed, Apr 19, 2023 at 6:30 PM Jia Yu wrote: > > Hi, > > This is likely caused by skewed data. To address that: > > (1) try to increase the number of partitions in your two input > DataFrame. For example, df = df.repartition(1000) > (2) Try to switch the sides of spatial joins, this might improve

Re: Sedona workers stuck in loop: DynamicIndexLookupJudgement: [xx, PID=xx] [Streaming shapes] Reached a milestone: xxxxxxx

2023-04-19 Thread Jia Yu
Hi, This is likely caused by skewed data. To address that: (1) try to increase the number of partitions in your two input DataFrame. For example, df = df.repartition(1000) (2) Try to switch the sides of spatial joins, this might improve the join performance Rule of thumb: The spatial partitionin

Sedona workers stuck in loop: DynamicIndexLookupJudgement: [xx, PID=xx] [Streaming shapes] Reached a milestone: xxxxxxx

2023-04-19 Thread Zhang, Hanxi (ISED/ISDE)
Hello sedona community, I am running a geospatial Sedona cluster on Amazon EMR. Specifically, my cluster is based on Spark 3.3.0 and Sedona 1.3.1-incubating. In my cluster of there are 10 executor nodes, and each runs two executors based on my configuration. I am using the above cluster to run

[jira] [Commented] (SEDONA-276) Add support for Spark 3.4

2023-04-19 Thread Martin Andersson (Jira)
[ https://issues.apache.org/jira/browse/SEDONA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714098#comment-17714098 ] Martin Andersson commented on SEDONA-276: - [~kontinuation] That sounds like a goo

[jira] [Commented] (SEDONA-276) Add support for Spark 3.4

2023-04-19 Thread Kristin Cowalcijk (Jira)
[ https://issues.apache.org/jira/browse/SEDONA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714052#comment-17714052 ] Kristin Cowalcijk commented on SEDONA-276: -- [~umartin] It is unnecessary to sepa

[jira] [Commented] (SEDONA-276) Add support for Spark 3.4

2023-04-19 Thread Martin Andersson (Jira)
[ https://issues.apache.org/jira/browse/SEDONA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713948#comment-17713948 ] Martin Andersson commented on SEDONA-276: - [~kontinuation] It appears that Iceber

Re: Sprint Goals for Apache projects next week

2023-04-19 Thread Joana Simoes
Hi Jia, Thanks for volunteering for giving a tutorial. Dinner is between 17:00 CEST and 18:00 CEST, so we could schedule you, right after that (at 18:00 CEST). If you agree on the slot, you can just add an entry at the end of the mentor stream section

[jira] [Commented] (SEDONA-276) Add support for Spark 3.4

2023-04-19 Thread Martin Andersson (Jira)
[ https://issues.apache.org/jira/browse/SEDONA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713940#comment-17713940 ] Martin Andersson commented on SEDONA-276: - During runtime, Sedona utilizes the lo