classpath entries with hdfs path

2018-12-12 Thread sandeep_katta
Hi All; I have use case where some of the jars on HDFS, these jars I want to include in my driver class path if I pass with --jars it works fine, but if I pass using spark.driver.extraClassPath it is failed spark-sql --master yarn --jars hdfs://hacluster/tmp/testjar/* //Jars are loaded to the

Stream Stream joins with update and complete mode

2018-10-25 Thread sandeep_katta
As per the documentation http://spark.apache.org/docs/2.3.2/structured-streaming-programming-guide.html#stream-stream-joins , only append mode is supported *As of Spark 2.3, you can use joins only when the query is in Append output mode. Other output modes are not yet supported.* But as per the

Re: Structured Streaming with Watermark

2018-10-18 Thread sandeep_katta
Now I ve added same aggregation query as below but still it is didn't filter val lines_stream = spark.readStream. format("kafka"). option("kafka.bootstrap.servers", "vm3:21005,vm2:21005"). option("subscribe", "s1"). load(). withColumn("tokens", split('value, ",")).

Structured Streaming with Watermark

2018-10-18 Thread sandeep_katta
I am trying to test the water mark concept in structured streaming using the below program import java.sql.Timestamp import org.apache.spark.sql.functions.{col, expr} import org.apache.spark.sql.streaming.Trigger val lines_stream = spark.readStream. format("kafka").

Migrating from kafka08 client to kafka010

2018-08-02 Thread sandeep_katta
Hi All, Recently I started migrating the code from kafka08 to kafka010. in 08 *topics * argument takes care of consuming number of partitions for each topic. def createStream( ssc: StreamingContext, zkQuorum: String, groupId: String, topics: Map[String, Int],

insertinto command fails if the location of the table is file

2018-06-21 Thread sandeep_katta
let's say I create table using below command create table csvTable . using CSV options (path "/user/data/customer.csv"); Crate Table command executes successfully irrespective of the presence of file(/user/data/customer.csv) If I try to insert some rows into this table it fails with below

[Spark-core] why Executors send HeartBeat to driver but not App Master

2018-03-02 Thread sandeep_katta
I want to attempt *SPARK-23545* bug,so I have some questions regarding the design, I am analyzing the communications between App Master->Driver and Executor->Driver and found that only Executors send HeartBeat to Driver. As per design Executor sends HearBeat to Driver for every

[Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread sandeep_katta
In case of client mode App Master and Driver are in different JVM process,the port opened by the Driver is vulnerable for flooding attacks as it is not closing the IDLE connections. I am thinking to fix this issue using below mechanism 1.Expose configuration to close the IDLE connections as

[Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-19 Thread sandeep_katta
SparkSubmit will open the port to communicate with the APP Master and executors. This port is not closing the IDLE connections,so it is vulnerable for DOS attack,I did telnet IP port and this connection is not closed. In order to fix this I tried to Handle in the *userEventTriggered * of