Re: Spark 2.1.1 Graphx graph loader GC overhead error

2017-07-11 Thread Aritra Mandal
yncxcw wrote > hi, > > I think if the OOM occurs before the computation begins, the input data is > probably too big to fit in memory. I remembered that the graph data would > expand when loading the data input memory. And the scale of expanding is > pretty huge( based on my experiment on

Limit the number of tasks submitted:spark.submit.tasks.threshold.enabled & spark.submit.tasks.threshold

2017-07-11 Thread 李斌松
Limit the number of tasks submitted to avoid a task occupancy attitude resources, while you can guide users to set reasonable conditions, [image: 内嵌图片 1] spark_submit_tasks_threshold.patch Description: Binary data - To

java IllegalStateException: unread block data Exception - setBlockDataMode

2017-07-11 Thread Kanagha
Hi, I am using spark 2.0.2. I'm not sure what is causing this error to occur. Would be really helpful for any inputs. Appreciate any help in this. Exception caught: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3,

Spark streaming does not seem to clear MapPartitionsRDD and ShuffledRDD that are persisted after the use of updateStateByKey and reduceByKeyAndWindow with inverse functions even after checkpointing th

2017-07-11 Thread SRK
Hi, Spark streaming does not seem to clear MapPartitionsRDD and ShuffledRDD that are persisted after the use of updateStateByKey and reduceByKeyAndWindow with inverse functions even after checkpointing the data. Any idea as to why thing happens? Is there a way that I can set a time out to clear

DataFrame --- join / groupBy-agg question...

2017-07-11 Thread muthu
I may be having a naive question on join / groupBy-agg. During the days of RDD, whenever I wanted to perform a. groupBy-agg, I used to say reduceByKey (of PairRDDFunctions) with an optional Partition-Strategy (with is number of partitions or Partitioner) b. join (of PairRDDFunctions) and its

Re: [ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-11 Thread Jean Georges Perrin
Awesome! Congrats! Can't wait!! jg > On Jul 11, 2017, at 18:48, Michael Armbrust wrote: > > Hi all, > > Apache Spark 2.2.0 is the third release of the Spark 2.x line. This release > removes the experimental tag from Structured Streaming. In addition, this > release

[ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-11 Thread Michael Armbrust
Hi all, Apache Spark 2.2.0 is the third release of the Spark 2.x line. This release removes the experimental tag from Structured Streaming. In addition, this release focuses on usability, stability, and polish, resolving over 1100 tickets. We'd like to thank our contributors and users for their

DataFrame --- join / groupBy-agg question...

2017-07-11 Thread Muthu Jayakumar
Hello there, I may be having a naive question on join / groupBy-agg. During the days of RDD, whenever I wanted to perform a. groupBy-agg, I used to say reduceByKey (of PairRDDFunctions) with an optional Partition-Strategy (with is number of partitions or Partitioner) b. join (of PairRDDFunctions)

Re: Testing another Dataset after ML training

2017-07-11 Thread Michael C. Kunkel
Greetings, Thanks for the communication. I attached the entire stacktrace in which was output to the screen. I tried to use JavaRDD and LabeledPoint then convert to Dataset and I still get the same error as I did when I only used datasets. I am using the expected ml Vector. I tried it using

[Spark Streaming] - ERROR Error cleaning broadcast Exception

2017-07-11 Thread Nipun Arora
Hi All, I get the following error while running my spark streaming application, we have a large application running multiple stateful (with mapWithState) and stateless operations. It's getting difficult to isolate the error since spark itself hangs and the only error we see is in the spark log

Re: Query via Spark Thrift Server return wrong result.

2017-07-11 Thread Valentin Ursu
Apologies, wrong shortcuts in gmail and I managed to send the mail before I finished editing the query. I edited it below. On Tue, Jul 11, 2017 at 7:58 PM, Valentin Ursu < valentindaniel.u...@gmail.com> wrote: > Hello, > > Short description: A SQL query sent via Thrift server returns an >

Query via Spark Thrift Server return wrong result.

2017-07-11 Thread Valentin Ursu
Hello, Short description: A SQL query sent via Thrift server returns an inexplicable response. Running the same (exact same) query inside Apache Zeppelin or submitting a job returns the correct result. Furthermore, a similar table returns the correct response in both cases. Details: I'm using

Re: Testing another Dataset after ML training

2017-07-11 Thread Riccardo Ferrari
Mh, to me feels like there some data mismatch. Are you sure you're using the expected Vector (ml vs mllib). I am not sure you attached the whole Exception but you might find some more useful details there. Best, On Tue, Jul 11, 2017 at 3:07 PM, mckunkel wrote: > Im not

Re: Testing another Dataset after ML training

2017-07-11 Thread mckunkel
Im not sure why I cannot subscribe, so that everyone can view the conversation. Help? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Testing-another-Dataset-after-ML-training-tp28845p28846.html Sent from the Apache Spark User List mailing list archive at

Re: Testing another Dataset after ML training

2017-07-11 Thread Michael C. Kunkel
Greetings, I am 50.50 sure the data format is correct, as if I split the data the classifier works properly. If I introduce another dataset, created identically to the one it is trained on. However, the creation of the data itself is in doubt, but I do not see any help on this subject with

Re: Testing another Dataset after ML training

2017-07-11 Thread Riccardo Ferrari
Hi, Are you sure you're feeding the correct data format? I found this conversation that might be useful: http://apache-spark-user-list.1001560.n3.nabble.com/Description-of-data-file-sample-libsvm-data-txt-td25832.html Best, On Tue, Jul 11, 2017 at 1:42 PM, mckunkel

Testing another Dataset after ML training

2017-07-11 Thread mckunkel
Greetings, Following the example on the AS page for Naive Bayes using Dataset https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes I want to predict the outcome of another set of