Re: Self join

2019-01-30 Thread Marco Gaido
Hi all, this thread got a bit stuck. Hence, if there are no objections, I'd go ahead with a design doc describing the solution/workaround I mentioned before. Any concerns? Thanks, Marco Il giorno gio 13 dic 2018 alle ore 18:15 Ryan Blue ha scritto: > Thanks for the extra context, Marco. I

Re: Self join

2018-12-13 Thread Ryan Blue
Thanks for the extra context, Marco. I thought you were trying to propose a solution. On Thu, Dec 13, 2018 at 2:45 AM Marco Gaido wrote: > Hi Ryan, > > My goal with this email thread is to discuss with the community if there > are better ideas (as I was told many other people tried to address

Re: Self join

2018-12-13 Thread Marco Gaido
Hi Ryan, My goal with this email thread is to discuss with the community if there are better ideas (as I was told many other people tried to address this). I'd consider this as a brainstorming email thread. Once we have a good proposal, then we can go ahead with a SPIP. Thanks, Marco Il giorno

Re: Self join

2018-12-12 Thread Ryan Blue
Marco, I'm actually asking for a design doc that clearly states the problem and proposes a solution. This is a substantial change and probably should be an SPIP. I think that would be more likely to generate discussion than referring to PRs or a quick paragraph on the dev list, because the only

Re: Self join

2018-12-12 Thread Marco Gaido
Thank you all for your answers. @Ryan Blue sure, let me state the problem more clearly: imagine you have 2 dataframes with a common lineage (for instance one is derived from the other by some filtering or anything you prefer). And imagine you want to join these 2 dataframes. Currently, there is

Re: Self join

2018-12-11 Thread Jörn Franke
I don’t know your exact underlying business problem, but maybe a graph solution, such as Spark Graphx meets better your requirements. Usually self-joins are done to address some kind of graph problem (even if you would not describe it as such) and is for these kind of problems much more

Re: Self join

2018-12-11 Thread Ryan Blue
Marco, Thanks for starting the discussion! I think it would be great to have a clear description of the problem and a proposed solution. Do you have anything like that? It would help bring the rest of us up to speed without reading different pull requests. Thanks! rb On Tue, Dec 11, 2018 at

Self join

2018-12-11 Thread Marco Gaido
Hi all, I'd like to bring to the attention of a more people a problem which has been there for long, ie, self joins. Currently, we have many troubles with them. This has been reported several times to the community and seems to affect many people, but as of now no solution has been accepted for

Re: [SQL] Self join with ArrayType columns problems

2015-01-28 Thread PierreB
Should I file a JIRA for this? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Self-join-with-ArrayType-columns-problems-tp10269p10322.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com