Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-11 Thread Reynold Xin
I filed a ticket: https://issues.apache.org/jira/browse/INFRA-17403 Please add your support there. On Tue, Dec 11, 2018 at 4:58 PM, Sean Owen < sro...@apache.org > wrote: > > I asked on the original ticket at https:/ / issues. apache. org/ jira/ browse/ > INFRA-17385 (

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-11 Thread Sean Owen
I asked on the original ticket at https://issues.apache.org/jira/browse/INFRA-17385 but no follow-up. Go ahead and open a new INFRA ticket. On Tue, Dec 11, 2018 at 6:20 PM Reynold Xin wrote: > Thanks, Sean. Which INFRA ticket is it? It's creating a lot of noise so I > want to put some pressure

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-11 Thread Hyukjin Kwon
Me too. I want to put some input as well if that can be helpful. On Wed, 12 Dec 2018, 8:20 am Reynold Xin Thanks, Sean. Which INFRA ticket is it? It's creating a lot of noise so I > want to put some pressure myself there too. > > > On Mon, Dec 10, 2018 at 9:51 AM, Sean Owen wrote: > >> Agree,

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-11 Thread Reynold Xin
Thanks, Sean. Which INFRA ticket is it? It's creating a lot of noise so I want to put some pressure myself there too. On Mon, Dec 10, 2018 at 9:51 AM, Sean Owen < sro...@apache.org > wrote: > > > > Agree, I'll ask on the INFRA ticket and follow up. That's a lot of extra > noise. > > > >

Re: GitHub sync

2018-12-11 Thread Dongjoon Hyun
Now, it's recovered. Dongjoon. On Tue, Dec 11, 2018 at 2:15 PM Dongjoon Hyun wrote: > https://issues.apache.org/jira/browse/INFRA-17401 is filed. > > Dongjoon. > > On Tue, Dec 11, 2018 at 12:49 PM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Currently, GitHub `spark:branch-2.4` is out of sync

Re: GitHub sync

2018-12-11 Thread Dongjoon Hyun
https://issues.apache.org/jira/browse/INFRA-17401 is filed. Dongjoon. On Tue, Dec 11, 2018 at 12:49 PM Dongjoon Hyun wrote: > Hi, All. > > Currently, GitHub `spark:branch-2.4` is out of sync (with two commits). > > >

GitHub sync

2018-12-11 Thread Dongjoon Hyun
Hi, All. Currently, GitHub `spark:branch-2.4` is out of sync (with two commits). https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.4 https://github.com/apache/spark/commits/branch-2.4 I did the followings already. 1. Wait for the next commit. 2. Trigger

Re: proposal for expanded & consistent timestamp types

2018-12-11 Thread Li Jin
Of course. I added some comments in the doc. On Tue, Dec 11, 2018 at 12:01 PM Imran Rashid wrote: > Hi Li, > > thanks for the comments! I admit I had not thought very much about python > support, its a good point. But I'd actually like to clarify one thing > about the doc -- though it

[Apache Beam] Custom DataSourceV2 instanciation: parameters passing and Encoders

2018-12-11 Thread Etienne Chauchot
Hi Spark guys, I'm Etienne Chauchot and I'm a committer on the Apache Beam project. We have what we call runners. They are pieces of software that translate pipelines written using Beam API into pipelines that use native execution engine API. Currently, the Spark runner uses old RDD / DStream

Re: Self join

2018-12-11 Thread Jörn Franke
I don’t know your exact underlying business problem, but maybe a graph solution, such as Spark Graphx meets better your requirements. Usually self-joins are done to address some kind of graph problem (even if you would not describe it as such) and is for these kind of problems much more

Re: proposal for expanded & consistent timestamp types

2018-12-11 Thread Imran Rashid
Hi Li, thanks for the comments! I admit I had not thought very much about python support, its a good point. But I'd actually like to clarify one thing about the doc -- though it discusses java types, the point is actually about having support for these logical types at the SQL level. The doc

Re: Self join

2018-12-11 Thread Ryan Blue
Marco, Thanks for starting the discussion! I think it would be great to have a clear description of the problem and a proposed solution. Do you have anything like that? It would help bring the rest of us up to speed without reading different pull requests. Thanks! rb On Tue, Dec 11, 2018 at

Re: Pushdown in DataSourceV2 question

2018-12-11 Thread Ryan Blue
In v2, it is up to the data source to tell Spark that a pushed filter is satisfied, by returning the pushed filters that Spark should run. You can indicate that a filter is handled by the source by not returning it for Spark. You can also show that a filter is used by the source by showing it in

Self join

2018-12-11 Thread Marco Gaido
Hi all, I'd like to bring to the attention of a more people a problem which has been there for long, ie, self joins. Currently, we have many troubles with them. This has been reported several times to the community and seems to affect many people, but as of now no solution has been accepted for

Re: Pushdown in DataSourceV2 question

2018-12-11 Thread Noritaka Sekiyama
Hi, Thank you for responding to this thread. I'm really interested in this discussion. My original idea might be the same as what Alessandro said, introducing a mechanism that Spark can communicate with DataSource and get metadata which shows if pushdown is supported or not. I'm wondering if it