date:20170711

Re: Spark 2.1.1 Graphx graph loader GC overhead error

2017-07-11 Thread Aritra Mandal

yncxcw wrote > hi, > > I think if the OOM occurs before the computation begins, the input data is > probably too big to fit in memory. I remembered that the graph data would > expand when loading the data input memory. And the scale of expanding is > pretty huge( based on my experiment on Pagerank

Limit the number of tasks submitted：spark.submit.tasks.threshold.enabled & spark.submit.tasks.threshold

2017-07-11 Thread 李斌松

Limit the number of tasks submitted to avoid a task occupancy attitude resources, while you can guide users to set reasonable conditions, [image: 内嵌图片 1] spark_submit_tasks_threshold.patch Description: Binary data - To unsubscr

java IllegalStateException: unread block data Exception - setBlockDataMode

2017-07-11 Thread Kanagha

Hi, I am using spark 2.0.2. I'm not sure what is causing this error to occur. Would be really helpful for any inputs. Appreciate any help in this. Exception caught: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ...

Spark streaming does not seem to clear MapPartitionsRDD and ShuffledRDD that are persisted after the use of updateStateByKey and reduceByKeyAndWindow with inverse functions even after checkpointing th

2017-07-11 Thread SRK

Hi, Spark streaming does not seem to clear MapPartitionsRDD and ShuffledRDD that are persisted after the use of updateStateByKey and reduceByKeyAndWindow with inverse functions even after checkpointing the data. Any idea as to why thing happens? Is there a way that I can set a time out to clear th

DataFrame --- join / groupBy-agg question...

2017-07-11 Thread muthu

I may be having a naive question on join / groupBy-agg. During the days of RDD, whenever I wanted to perform a. groupBy-agg, I used to say reduceByKey (of PairRDDFunctions) with an optional Partition-Strategy (with is number of partitions or Partitioner) b. join (of PairRDDFunctions) and its varian

Re: [ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-11 Thread Jean Georges Perrin

Awesome! Congrats! Can't wait!! jg > On Jul 11, 2017, at 18:48, Michael Armbrust wrote: > > Hi all, > > Apache Spark 2.2.0 is the third release of the Spark 2.x line. This release > removes the experimental tag from Structured Streaming. In addition, this > release focuses on usability, sta

[ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-11 Thread Michael Armbrust

Hi all, Apache Spark 2.2.0 is the third release of the Spark 2.x line. This release removes the experimental tag from Structured Streaming. In addition, this release focuses on usability, stability, and polish, resolving over 1100 tickets. We'd like to thank our contributors and users for their c

DataFrame --- join / groupBy-agg question...

2017-07-11 Thread Muthu Jayakumar

Hello there, I may be having a naive question on join / groupBy-agg. During the days of RDD, whenever I wanted to perform a. groupBy-agg, I used to say reduceByKey (of PairRDDFunctions) with an optional Partition-Strategy (with is number of partitions or Partitioner) b. join (of PairRDDFunctions)

Re: Testing another Dataset after ML training

2017-07-11 Thread Michael C. Kunkel

Greetings, Thanks for the communication. I attached the entire stacktrace in which was output to the screen. I tried to use JavaRDD and LabeledPoint then convert to Dataset and I still get the same error as I did when I only used datasets. I am using the expected ml Vector. I tried it using th

[Spark Streaming] - ERROR Error cleaning broadcast Exception

2017-07-11 Thread Nipun Arora

Hi All, I get the following error while running my spark streaming application, we have a large application running multiple stateful (with mapWithState) and stateless operations. It's getting difficult to isolate the error since spark itself hangs and the only error we see is in the spark log and

Re: Query via Spark Thrift Server return wrong result.

2017-07-11 Thread Valentin Ursu

Apologies, wrong shortcuts in gmail and I managed to send the mail before I finished editing the query. I edited it below. On Tue, Jul 11, 2017 at 7:58 PM, Valentin Ursu < valentindaniel.u...@gmail.com> wrote: > Hello, > > Short description: A SQL query sent via Thrift server returns an > inexpl

Query via Spark Thrift Server return wrong result.

2017-07-11 Thread Valentin Ursu

Hello, Short description: A SQL query sent via Thrift server returns an inexplicable response. Running the same (exact same) query inside Apache Zeppelin or submitting a job returns the correct result. Furthermore, a similar table returns the correct response in both cases. Details: I'm using Spa

Re: Testing another Dataset after ML training

2017-07-11 Thread Riccardo Ferrari

Mh, to me feels like there some data mismatch. Are you sure you're using the expected Vector (ml vs mllib). I am not sure you attached the whole Exception but you might find some more useful details there. Best, On Tue, Jul 11, 2017 at 3:07 PM, mckunkel wrote: > Im not sure why I cannot subscri

Re: Testing another Dataset after ML training

2017-07-11 Thread mckunkel

Im not sure why I cannot subscribe, so that everyone can view the conversation. Help? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Testing-another-Dataset-after-ML-training-tp28845p28846.html Sent from the Apache Spark User List mailing list archive at Na

Re: Testing another Dataset after ML training

2017-07-11 Thread Michael C. Kunkel

Greetings, I am 50.50 sure the data format is correct, as if I split the data the classifier works properly. If I introduce another dataset, created identically to the one it is trained on. However, the creation of the data itself is in doubt, but I do not see any help on this subject with Da

Re: Testing another Dataset after ML training

2017-07-11 Thread Riccardo Ferrari

Hi, Are you sure you're feeding the correct data format? I found this conversation that might be useful: http://apache-spark-user-list.1001560.n3.nabble.com/Description-of-data-file-sample-libsvm-data-txt-td25832.html Best, On Tue, Jul 11, 2017 at 1:42 PM, mckunkel wrote: > Greetings, > > Foll

Testing another Dataset after ML training

2017-07-11 Thread mckunkel

Greetings, Following the example on the AS page for Naive Bayes using Dataset https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes I want to predict the outcome of another set of

Re: Spark 2.1.1 Graphx graph loader GC overhead error

Limit the number of tasks submitted：spark.submit.tasks.threshold.enabled & spark.submit.tasks.threshold

java IllegalStateException: unread block data Exception - setBlockDataMode

Spark streaming does not seem to clear MapPartitionsRDD and ShuffledRDD that are persisted after the use of updateStateByKey and reduceByKeyAndWindow with inverse functions even after checkpointing th

DataFrame --- join / groupBy-agg question...

Re: [ANNOUNCE] Announcing Apache Spark 2.2.0

[ANNOUNCE] Announcing Apache Spark 2.2.0

DataFrame --- join / groupBy-agg question...

Re: Testing another Dataset after ML training

[Spark Streaming] - ERROR Error cleaning broadcast Exception

Re: Query via Spark Thrift Server return wrong result.

Query via Spark Thrift Server return wrong result.

Re: Testing another Dataset after ML training

Re: Testing another Dataset after ML training

Re: Testing another Dataset after ML training

Re: Testing another Dataset after ML training

Testing another Dataset after ML training

17 matches

Site Navigation

Mail list logo

Footer information