Re: DataSourceV2 write input requirements

2018-03-26 Thread Wenchen Fan
Yea it is for read-side only. I think for the write-side, implementations can provide some options to allow users to set partitioning/ordering, or the data source has a natural partitioning/ordering which doesn't require any interface. On Mon, Mar 26, 2018 at 7:59 PM, Patrick Woody wrote: > Hey

Re: DataSourceV2 write input requirements

2018-03-26 Thread Patrick Woody
Hey Ryan, Ted, Wenchen Thanks for the quick replies. @Ryan - the sorting portion makes sense, but I think we'd have to ensure something similar to requiredChildDistribution in SparkPlan where we have the number of partitions as well if we'd want to further report to SupportsReportPartitioning, ye

Re: DataSourceV2 write input requirements

2018-03-26 Thread Ted Yu
Hmm. Ryan seems to be right. Looking at sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java : import org.apache.spark.sql.sources.v2.reader.partitioning.Partitioning; ... Partitioning outputPartitioning(); On Mon, Mar 26, 2018 at 6:58 PM, Wenchen Fan

Re: DataSourceV2 write input requirements

2018-03-26 Thread Ryan Blue
Wenchen, I thought SupportsReportPartitioning was for the read side. It works with the write side as well? On Mon, Mar 26, 2018 at 6:58 PM, Wenchen Fan wrote: > Actually clustering is already supported, please take a look at > SupportsReportPartitioning > > Ordering is not proposed yet, might be

Re: DataSourceV2 write input requirements

2018-03-26 Thread Wenchen Fan
Actually clustering is already supported, please take a look at SupportsReportPartitioning Ordering is not proposed yet, might be similar to what Ryan proposed. On Mon, Mar 26, 2018 at 6:11 PM, Ted Yu wrote: > Interesting. > > Should requiredClustering return a Set of Expression's ? > This way,

Re: DataSourceV2 write input requirements

2018-03-26 Thread Ted Yu
Interesting. Should requiredClustering return a Set of Expression's ? This way, we can determine the order of Expression's by looking at what requiredOrdering() returns. On Mon, Mar 26, 2018 at 5:45 PM, Ryan Blue wrote: > Hi Pat, > > Thanks for starting the discussion on this, we’re really inte

Re: DataSourceV2 write input requirements

2018-03-26 Thread Ryan Blue
Hi Pat, Thanks for starting the discussion on this, we’re really interested in it as well. I don’t think there is a proposed API yet, but I was thinking something like this: interface RequiresClustering { List requiredClustering(); } interface RequiresSort { List requiredOrdering(); } The r

subscribe

2018-03-26 Thread Minh Do

DataSourceV2 write input requirements

2018-03-26 Thread Patrick Woody
Hey all, I saw in some of the discussions around DataSourceV2 writes that we might have the data source inform Spark of requirements for the input data's ordering and partitioning. Has there been a proposed API for that yet? Even one level up it would be helpful to understand how I should be thin

Re: Re: the issue about the + in column,can we support the string please?

2018-03-26 Thread 1427357...@qq.com
Hi, Using concat is one of the way. But the + is more intuitive and easy to understand. 1427357...@qq.com From: Shmuel Blitz Date: 2018-03-26 15:31 To: 1427357...@qq.com CC: spark?users; dev Subject: Re: the issue about the + in column,can we support the string please? Hi, you can get the sa