Re: sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou
I am examined the code and found lazy val is added recently in 2.2.0 2017-09-07 14:34 GMT+08:00 ChenJun Zou : > thanks, > my mistake > > 2017-09-07 14:21 GMT+08:00 sujith chacko : > >> If your intention is to just view the logical plan in spark shell then I >> think you can follow the query whic

Re: sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou
thanks, my mistake 2017-09-07 14:21 GMT+08:00 sujith chacko : > If your intention is to just view the logical plan in spark shell then I > think you can follow the query which I mentioned in previous mail. In > spark 2.1.0 sessionState is a private member which you cannot access. > > Thanks. >

Re: sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou
spark-2.1.1 I use 2017-09-07 14:00 GMT+08:00 sujith chacko : > Hi, > may I know which version of spark you are using, in 2.2 I tried with > below query in spark-shell for viewing the logical plan and it's working > fine > > spark.sql("explain extended select * from table1") > > The above qu

CSV write to S3 failing silently with partial completion

2017-09-06 Thread abbim
Hi all, My team has been experiencing a recurring unpredictable bug where only a partial write to CSV in S3 on one partition of our Dataset is performed. For example, in a Dataset of 10 partitions written to CSV in S3, we might see 9 of the partitions as 2.8 GB in size, but one of them as 1.6 GB. H

sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou
Hi, when I use spark-shell to get the logical plan of sql, an error occurs scala> spark.sessionState :30: error: lazy value sessionState in class SparkSession cannot be accessed in org.apache.spark.sql.SparkSession spark.sessionState ^ But if I use spark-submit to access the

is it ok to have multiple sparksession's in one spark structured streaming app?

2017-09-06 Thread kant kodali
Hi All, I am wondering if it is ok to have multiple sparksession's in one spark structured streaming app? Basically, I want to create 1) Spark session for reading from Kafka and 2) Another Spark session for storing the mutations of a dataframe/dataset to a persistent table as I get the mutations f

[ANNOUNCE] Apache Bahir 2.2.0 Released

2017-09-06 Thread Luciano Resende
Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. The Apache Bahir community is pleased to announce the release of Apache Bahir 2.2.0 which provides the following extensions for Apache Sp

[Meetup] Apache Spark and Ignite for IoT scenarious

2017-09-06 Thread Denis Magda
Folks, Those who are craving for mind food this weekend come over the meetup - Santa Clara, Sept 9, 9.30 AM: https://www.meetup.com/datariders/events/242523245/?a=socialmedia — Denis

spark metrics prefix in Graphite is duplicated

2017-09-06 Thread Mikhailau, Alex
Hi guys, When I set up my EMR cluster with Spark I add "*.sink.graphite.prefix": "$env.$namespace.$team.$app" to metrics.properties The cluster comes up with correct metrics.properties Then I simply add-step to EMR with spark-submit without any metrics namespace parameter. In my Graphite, Spar

Non-time-based windows are not supported on streaming DataFrames/Datasets;;

2017-09-06 Thread kant kodali
Hi All, I get the following exception when I run the query below. Not sure what the cause is? "Non-time-based windows are not supported on streaming DataFrames/Datasets;" My TIME_STAMP field is of string type. dataset .sqlContext() .sql("select count(distinct ID), sum(AMOUNT) from (s

Spark Structured Streaming and compacted topic in Kafka

2017-09-06 Thread Olivier Girardot
Hi everyone, I'm aware of the issue regarding direct stream 0.10 consumer in spark and compacted topics (c.f. https://issues.apache.org/jira/browse/SPARK-17147). Is there any chance that spark structured-streaming kafka is compatible with compacted topics ? Regards, -- *Olivier Girardot*

Re: Multiple Kafka topics processing in Spark 2.2

2017-09-06 Thread Alonso Isidoro Roman
Hi, reading the official doc , i think you can do it this way: import org.apache.spark.streaming.kafka._ val directKafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, ka

Multiple Kafka topics processing in Spark 2.2

2017-09-06 Thread Dan Dong
Hi, All, I have one issue here about how to process multiple Kafka topics in a Spark 2.* program. My question is: How to get the topic name from a message received from Kafka? E.g: .. val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, k

Will an input event older than watermark be dropped?

2017-09-06 Thread 张万新
Hi, I'm investigating the watermark for some time, according to the guide, if we specify a watermark on event time column, the watermark will be used to drop old state data. Then, take window-based count for example, if an event whose time is older than the watermark comes, it will be simply dropp

Re: Python vs. Scala

2017-09-06 Thread Conconscious
Just run by yourself this test and check the results. During the run also check with top a worker. Python: import random def inside(p): x, y = random.random(), random.random() return x * x + y * y < 1 def estimate_pi(num_samples): count = sc.parallelize(xrange(0, num_samples)).filte