Re: Distinct on Map data type -- SPARK-19893

2018-01-13 Thread ckhari4u
Wan, Thanks a lot,! I see the issue now. Do we have any JIRA's open for the future work to be done on this? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@sp

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-13 Thread Sean Owen
The signatures and licenses look OK. Except for the missing k8s package, the contents look OK. Tests look pretty good with "-Phive -Phadoop-2.7 -Pyarn" on Ubuntu 17.10, except that KafkaContinuousSourceSuite seems to hang forever. That was just fixed and needs to get into an RC? Aside from the Blo

transformSchema method policy for "duplicated" column names

2018-01-13 Thread Alessandro Solimando
Hello everyone, after one month without any reply on stackoverflow ( https://stackoverflow.com/questions/47789265/inconsistency-in-handling-duplicate-names-in-dataframe-schema) I try to pose the question here. Context: I am refactoring some code of mine, transforming scala methods with a signature

Join Strategies

2018-01-13 Thread Marco Gaido
Hi dev, I have a question about how join strategies are defined. I see that CartesianProductExec is used only for InnerJoin, while for other kind of joins BroadcastNestedLoopJoinExec is used. For reference: https://github.com/apache/spark/blob/cd9f49a2aed3799964976ead06080a0f7044a0c3/sql/core/src

Remove or rename? What does ResolvedDataSourceSuite test?

2018-01-13 Thread Jacek Laskowski
Hi, It looks like ResolvedDataSourceSuite [1] is a left-over (after ResolveDataSource?). If not to be deleted, ResolvedDataSourceSuite should surely be renamed. Correct? [1] https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.s

Re: Compiling Spark UDF at runtime

2018-01-13 Thread Michael Shtelma
Thanks! yes, this would be an option of course. HDFS or Alluxio. Sincerely, Michael Shtelma On Fri, Jan 12, 2018 at 3:26 PM, Georg Heiler wrote: > You could store the jar in hdfs. Then even in yarn cluster mode your give > workaround should work. > Michael Shtelma schrieb am Fr. 12. Jan. 2018 u

Re: Distinct on Map data type -- SPARK-19893

2018-01-13 Thread Wenchen Fan
A very simple example is sql("select create_map(1, 'a', 2, 'b')") .union(sql("select create_map(2, 'b', 1, 'a')")) .distinct By definition a map should not care about the order of its entries, so the above query should return one record. However it returns 2 records before SPARK-19893 On Sat,