Also, in your example doesn't the tempview need to be accessed using the same sparkSession on the scala side? Since I am not using a notebook, how can I get access to the same sparksession in scala.
On Fri, Jul 28, 2017 at 3:17 PM, Priyank Shrivastava <priy...@asperasoft.com > wrote: > Thanks Burak. > > In a streaming context would I need to do any state management for the > temp views? for example across sliding windows. > > Priyank > > On Fri, Jul 28, 2017 at 3:13 PM, Burak Yavuz <brk...@gmail.com> wrote: > >> Hi Priyank, >> >> You may register them as temporary tables to use across language >> boundaries. >> >> Python: >> df = spark.readStream... >> # Python logic >> df.createOrReplaceTempView("tmp1") >> >> Scala: >> val df = spark.table("tmp1") >> df.writeStream >> .foreach(...) >> >> >> On Fri, Jul 28, 2017 at 3:06 PM, Priyank Shrivastava < >> priy...@asperasoft.com> wrote: >> >>> TD, >>> >>> For a hybrid python-scala approach, what's the recommended way of >>> handing off a dataframe from python to scala. I would like to know >>> especially in a streaming context. >>> >>> I am not using notebooks/databricks. We are running it on our own spark >>> 2.1 cluster. >>> >>> Priyank >>> >>> On Wed, Jul 26, 2017 at 12:49 PM, Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> >>>> We see that all the time. For example, in SQL, people can write their >>>> user-defined function in Scala/Java and use it from SQL/python/anywhere. >>>> That is the recommended way to get the best combo of performance and >>>> ease-of-use from non-jvm languages. >>>> >>>> On Wed, Jul 26, 2017 at 11:49 AM, Priyank Shrivastava < >>>> priy...@asperasoft.com> wrote: >>>> >>>>> Thanks TD. I am going to try the python-scala hybrid approach by >>>>> using scala only for custom redis sink and python for the rest of the app >>>>> . I understand it might not be as efficient as purely writing the app in >>>>> scala but unfortunately I am constrained on scala resources. Have you >>>>> come >>>>> across other use cases where people have resided to such python-scala >>>>> hybrid approach? >>>>> >>>>> Regards, >>>>> Priyank >>>>> >>>>> >>>>> >>>>> On Wed, Jul 26, 2017 at 1:46 AM, Tathagata Das < >>>>> tathagata.das1...@gmail.com> wrote: >>>>> >>>>>> Hello Priyank >>>>>> >>>>>> Writing something purely in Scale/Java would be the most efficient. >>>>>> Even if we expose python APIs that allow writing custom sinks in pure >>>>>> Python, it wont be as efficient as Scala/Java foreach as the data would >>>>>> have to go through JVM / PVM boundary which has significant overheads. So >>>>>> Scala/Java foreach is always going to be the best option. >>>>>> >>>>>> TD >>>>>> >>>>>> On Tue, Jul 25, 2017 at 6:05 PM, Priyank Shrivastava < >>>>>> priy...@asperasoft.com> wrote: >>>>>> >>>>>>> I am trying to write key-values to redis using a DataStreamWriter >>>>>>> object using pyspark structured streaming APIs. I am using Spark 2.2 >>>>>>> >>>>>>> Since the Foreach Sink is not supported for python; here >>>>>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach>, >>>>>>> I am trying to find out some alternatives. >>>>>>> >>>>>>> One alternative is to write a separate Scala module only to push >>>>>>> data into redis using foreach; ForeachWriter >>>>>>> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.ForeachWriter> >>>>>>> is >>>>>>> supported in Scala. BUT this doesn't seem like an efficient approach and >>>>>>> adds deployment overhead because now I will have to support Scala in my >>>>>>> app. >>>>>>> >>>>>>> Another approach is obviously to use Scala instead of python, which >>>>>>> is fine but I want to make sure that I absolutely cannot use python for >>>>>>> this problem before I take this path. >>>>>>> >>>>>>> Would appreciate some feedback and alternative design approaches for >>>>>>> this problem. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >