Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread Deepak Sharma
On Sun, Dec 18, 2016 at 2:26 AM, vaquar khan wrote: > select * from indexInfo; > Hi Vaquar I do not see CF with the name indexInfo in any of the cassandra databases. Thank Deepak -- Thanks Deepak www.bigdatabig.com www.keosha.net

Re: What is the deployment model for Spark Streaming? A specific example.

2016-12-17 Thread Divya Gehlot
I am not pyspark person .. But from the errors I could figure out that your Spark application is having memory issues . Are you collecting the results to the driver at any point of time or have configured less memory for the nodes ? and If you are using Dataframes then there is issue raised in Ji

Re: What is the deployment model for Spark Streaming? A specific example.

2016-12-17 Thread Russell Jurney
Anyone? This is for a book, so I need to figure this out. On Fri, Dec 16, 2016 at 12:53 AM Russell Jurney wrote: > I have created a PySpark Streaming application that uses Spark ML to > classify flight delays into three categories: on-time, slightly late, very > late. After an hour or so somethi

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px #715FFA solid !important; padding-left:1ex !important; background-color:white !important; } Super, that works! Thanks Sent from Yahoo Mail for iPhone On Sunday, December 18, 2016, 11:28 AM, Yong Zhang wrote: -- P {m

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Yong Zhang
Why not you just return the struct you defined, instead of an array? @Override public Row call(Double x, Double y) throws Exception { Row row = RowFactory.create(x, y); return row; } From: Richa

Kafka Spark structured streaming latency benchmark.

2016-12-17 Thread Prashant Sharma
Hi, Goal of my benchmark is to arrive at end to end latency lower than 100ms and sustain them over time, by consuming from a kafka topic and writing back to another kafka topic using Spark. Since the job does not do aggregation and does a constant time processing on each message, it appeared to me

Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread Deepak Sharma
There are 8 worker nodes in the cluster . Thanks Deepak On Dec 18, 2016 2:15 AM, "Holden Karau" wrote: > How many workers are in the cluster? > > On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma > wrote: > >> Hi All, >> I am iterating over data frame's paritions using df.foreachPartition . >> Up

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
I tried to transform root  |-- latitude: double (nullable = false)  |-- longitude: double (nullable = false)  |-- name: string (nullable = true) to: root  |-- name: string (nullable = true)  |-- location: struct (nullable = true)  |    |-- longitude: double (nullable = true)  |    |-- latitude: d

Re: Running Multiple Versions of Spark on the same cluster (YARN)

2016-12-17 Thread Koert Kuipers
spark only needs to be present on the machine that launches it using spark-submit On Sat, Dec 17, 2016 at 3:59 PM, Jorge Machado wrote: > Hi Tiago, > > thx for the update. Lat question : but this spark-submit that you are > using need to be on the same version on all yarn hosts ? > Regards > > J

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Yong Zhang
"[D" type means a double array type. So this error simple means you have double[] data, but Spark needs to cast it to Double, as your schema defined. The error message clearly indicates the data doesn't match with the type specified in the schema. I wonder how you are so sure about your data

Re: Regarding Connection Problem

2016-12-17 Thread Luciano Resende
On Fri, Dec 16, 2016 at 7:01 PM, Chintan Bhatt < chintanbhatt...@charusat.ac.in> wrote: Hi > I want to give continuous output (avg. temperature) generated from node.js > to store on Hadoop and then retrieve it for visualization. > please guide me how to give continuous output of node.js to kaf

Re: Running Multiple Versions of Spark on the same cluster (YARN)

2016-12-17 Thread Jorge Machado
Hi Tiago, thx for the update. Lat question : but this spark-submit that you are using need to be on the same version on all yarn hosts ? Regards Jorge Machado > On 17 Dec 2016, at 16:46, Tiago Albineli Motta wrote: > > Hi Jorge, > > Here we are using an apache hadoop instalation, and

Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread vaquar khan
Hi Deepak, Could you share Index information in your database. select * from indexInfo; Regards, Vaquar khan On Sat, Dec 17, 2016 at 2:45 PM, Holden Karau wrote: > How many workers are in the cluster? > > On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma > wrote: > >> Hi All, >> I am iterating

Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread Holden Karau
How many workers are in the cluster? On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma wrote: > Hi All, > I am iterating over data frame's paritions using df.foreachPartition . > Upon each iteration of row , i am initializing DAO to insert the row into > cassandra. > Each of these iteration takes a

foreachPartition's operation is taking long to finish

2016-12-17 Thread Deepak Sharma
Hi All, I am iterating over data frame's paritions using df.foreachPartition . Upon each iteration of row , i am initializing DAO to insert the row into cassandra. Each of these iteration takes almost 1 and half minute to finish. In my workflow , this is part of an action and 100 partitions are bei

Re: Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
data is good On Saturday, December 17, 2016 11:50 PM, "zjp_j...@163.com" wrote: #yiv7434848277 body {line-height:1.5;}#yiv7434848277 blockquote {margin-top:0px;margin-bottom:0px;margin-left:0.5em;}#yiv7434848277 div.yiv7434848277foxdiv20161217234614718397 {}#yiv7434848277 body {font

Re: Java to show struct field from a Dataframe

2016-12-17 Thread zjp_j...@163.com
I think the causation is your invanlid Double data , have u checked your data ? zjp_j...@163.com From: Richard Xin Date: 2016-12-17 23:28 To: User Subject: Java to show struct field from a Dataframe let's say I have a DataFrame with schema of followings: root |-- name: string (nullable = true

Re: Running Multiple Versions of Spark on the same cluster (YARN)

2016-12-17 Thread Tiago Albineli Motta
Hi Jorge, Here we are using an apache hadoop instalation, and to run multiple versions we just need to change the submit in the client using the correct spark version you need. $SPARK_HOME/bin/spark-submit and pass the correct Spark libs in the conf. For spark 2.0.0 --conf spark.yarn.archive=

Java to show struct field from a Dataframe

2016-12-17 Thread Richard Xin
let's say I have a DataFrame with schema of followings:root  |-- name: string (nullable = true)  |-- location: struct (nullable = true)  |    |-- longitude: double (nullable = true)  |    |-- latitude: double (nullable = true) df.show(); throws following exception: java.lang.ClassCastException: [D

The spark hive udf can read broadcast the variables?

2016-12-17 Thread 李斌松
The spark hive udf can read broadcast the variables?

theory question

2016-12-17 Thread kant kodali
Given a set of transformations does spark create multiple DAG's and picks the DAG by some metric such as say higher degree of concurrency or something else like the typical task graph model in parallel computing suggests? or does it simply builds one simple DAG by going through transformations/task

Fwd: SparkLauncher does not return State/ID on a standalone cluster

2016-12-17 Thread Rahul Raj
I am unable to retrieve the state and Id of a submitted application on a Standalone cluster. The job gets executed successfully on the cluster. The state was checked using: while(!handle.getState().isFinal()){ //print handle.getState() } When run as local, state gets reported correctly. Rega

Re: Dataset encoders for further types?

2016-12-17 Thread Michal Šenkýř
I actually already made a pull request adding support for arbitrary sequence types. https://github.com/apache/spark/pull/16240 There is still a little problem of Seq.toDS not working for those types (couldn't get implicits with multiple type parameters to resolve correctly) but createDataset

How to perform Join operation using JAVARDD

2016-12-17 Thread Sree Eedupuganti
I tried like this, *CrashData_1.csv:* *CRASH_KEYCRASH_NUMBER CRASH_DATECRASH_MONTH* *2016899114 2016899114 01/02/2016 12:00:00 AM +* *CrashData_2.csv:* *CITY_NAMEZIPCODE CITY STATE* *1945 704

Re: need help to have a Java version of this scala script

2016-12-17 Thread Richard Xin
thanks for pointing to the right direction, I have figured out the way. On Saturday, December 17, 2016 5:23 PM, Igor Berman wrote: do you mind to show what you have in java?in general $"bla" is col("bla") as soon as you import appropriate functionimport static org.apache.spark.sql.fu

Re: need help to have a Java version of this scala script

2016-12-17 Thread Igor Berman
do you mind to show what you have in java? in general $"bla" is col("bla") as soon as you import appropriate function import static org.apache.spark.sql.functions.callUDF; import static org.apache.spark.sql.functions.col; udf should be callUDF e.g. ds.withColumn("localMonth", callUDF("toLocalMonth"