Re: Problem in Spark-Kafka Connector

2017-12-27 Thread Sitakant Mishra
Hi, Kindly help me with this problem, for which I will be grateful. Thanks and Regards, Sitakanta Mishra On Tue, Dec 26, 2017 at 12:34 PM, Sitakant Mishra < sitakanta.mis...@gmail.com> wrote: > Hi, > > I am trying to connect my Spark cluster to a single Kafka Topic which > running as a separate

Partition Dataframe Using UDF On Partition Column

2017-12-27 Thread Richard Primera
Greetings, In version 1.6.0, is it possible to write a partitioned dataframe into parquet format using a UDF function on the partition column? I'm using pyspark. Let's say I have a dataframe with coumn `date`, of type string or int, which contains values such as `20170825`. Is it possible to def

Spark and neural networks

2017-12-27 Thread Esa Heikkinen
Hi What would be the best way to use Spark and neutral networks (especially RNN LSTM) ? I think it would be possible by "tool"-combination: Pyspark + anaconda + pandas + numpy + keras + tensorflow + scikit But what about scalability and usability by Spark (pyspark) ? How compatible are da

Re: Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

2017-12-27 Thread Geoff Von Allmen
I’ve tried it both ways. Uber jar gives me gives me the following: - Caused by: java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html If I only do minimal packaging and add org.apache.spark_spark-sq

Re: Apache Spark - Structured Streaming graceful shutdown

2017-12-27 Thread Eyal Zituny
Hi if you're interested in stopping you're spark application externally, you will probably need a way to communicate with the spark driver (which start and holds a ref to the spark context) this can be done by adding some code to the driver app, for example: - you can expose a rest api that st

Re: Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

2017-12-27 Thread Eyal Zituny
Hi, it seems that you're missing the kafka-clients jar (and probably some other dependencies as well) how did you packaged you application jar? does it includes all the required dependencies (as an uber jar)? if it's not an uber jar you need to pass via the driver-class-path and the executor-class-