[Arrow][Dremio]

2018-05-13 Thread xmehaut
Hello, I've some question about Spark and Apache Arrow. Up to now, Arrow is only used for sharing data between Python and Spark executors instead of transmitting them through sockets. I'm studying currently Dremio as an interesting way to access multiple sources of data, and as a potential

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

2018-05-13 Thread Jacek Laskowski
Hi, The exception message should be self-explanatory and says that you cannot join two streaming Datasets. This feature was added in 2.3 if I'm not mistaken. Just to be sure that you work with two streaming Datasets, can you show the query plan of the join query? Jacek On Sat, 12 May 2018,

Re: Measure performance time in some spark transformations.

2018-05-13 Thread Jörn Franke
Can’t you find this in the Spark UI or timeline server? > On 13. May 2018, at 00:31, Guillermo Ortiz Fernández > wrote: > > I want to measure how long it takes some different transformations in Spark > as map, joinWithCassandraTable and so on. Which one is the