Qs on Dataset API -- groups of createXXXTempViews and XXXcheckpoint methods

2018-07-26 Thread Jacek Laskowski
Hi, I'd appreciate your help on the following two questions about Dataset API: 1. Why do Dataset methods: createTempView, createOrReplaceTempView, createGlobalTempView and createOrReplaceGlobalTempView not return a DataFrame? They seem to be neither actions nor transformations (and probably the r

offheap memory usage & netty configuration

2018-07-26 Thread Imran Rashid
*I’ve been looking at where untracked memory is getting used in spark, especially offheap memory, and I’ve discovered some things I’d like to share with the community. Most of what I’ve learned has been about the way spark is using netty -- I’ll go into some more detail about that below. I’m also

Re: [DISCUSS][SQL] Control the number of output files

2018-07-26 Thread Reynold Xin
John, You want to create a ticket and submit a patch for this? If there is a coalesce hint, inject a coalesce logical node. Pretty simple. On Wed, Jul 25, 2018 at 2:48 PM John Zhuge wrote: > Thanks for the comment, Forest. What I am asking is to make whatever DF > repartition/coalesce function

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-07-26 Thread Reynold Xin
Seems reasonable at high level. I don't think we can use Expression's and SortOrder's in public APIs though. Those are not meant to be public and can break easily across versions. On Tue, Jul 24, 2018 at 9:26 AM Ryan Blue wrote: > The recently adopted SPIP to standardize logical plans requires

Re: [DISCUSS][SQL] Control the number of output files

2018-07-26 Thread John Zhuge
Filed https://issues.apache.org/jira/browse/SPARK-24940. Will upload a patch shortly. SPARK-20857 introduced a generic SQL Hint Framework since 2.2.0. On Thu, Jul 26, 2018 at 4:25 PM Reynold Xin wrote: > John, > > You want to create a ticket and submit a patch for this? If there is a > coalesce

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-07-26 Thread Ryan Blue
I don’t think that we want to block this work until we have a public and stable Expression. Like our decision to expose InternalRow, I think that while this option isn’t great, it at least allows us to move forward. We can hopefully replace it later. Also note that the use of Expression is in the

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-26 Thread Wenchen Fan
This seems fine to me. BTW Ryan Blue and I are working on some data source v2 stuff and hopefully we can get more things done with one more week. Thanks, Wenchen On Thu, Jul 26, 2018 at 1:14 PM Xingbo Jiang wrote: > Xiangrui and I are leading an effort to implement a highly desirable > feature