addPyFile error: NotADirectoryError: [Errno 20] Not a directory

2021-06-07 Thread Gourav Sengupta
Hello dear friends, I hope everyone is doing fine and staying safe. This query is for SPARK 3.0.1. The following works: > pyspark --py-files s3://gourav-bucket/spark_nlp_display-1.7-py3.7.egg > >>> import sparknlp_display > >>> But when I start python, and then create a spark session then

Re: Petastorm vs horovod vs tensorflowonspark vs spark_tensorflow_distributor

2021-06-07 Thread Gourav Sengupta
Hi Sean, thank you so much for your kind response :) Regards, Gourav Sengupta On Sat, Jun 5, 2021 at 8:00 PM Sean Owen wrote: > All of these tools are reasonable choices. I don't think the Spark project > itself has a view on what works best. These things do different things. For > example

Re: class KafkaCluster related errors

2021-06-07 Thread Kiran Biswal
Hi Mich, Thanks a lot for your response. I am basically trying to get some older code(streaming job to read from kafka) in 2.0.1 spark to work in 3.0,1. The specific area where I am having problem (KafkaCluster) has most likely to do with get/ set commit offsets in kafka // Create message

Re: class KafkaCluster related errors

2021-06-07 Thread Mich Talebzadeh
Hi Kiran, As you be aware createDirectStream is depreciated and you ought to use Spark Structured streaming, especially that you are moving to version 3.0.1. If you still want to use dstream then that page seems to be correct Looking at my old code I have import org.apache.spark.streaming._

Re: class KafkaCluster related errors

2021-06-07 Thread Mich Talebzadeh
Hi, Are you trying to read topics from Kafka in spark 3.0.1? Have you checked Spark 3.0.1 documentation? Integrating Spark with Kafka is pretty straight forward. with 3.0.1 and higher HTH view my Linkedin profile *Disclaimer:*