date:20170311

Re: Pyspark 2.1.0 weird behavior with repartition

2017-03-11 Thread Olivier Girardot

I kinda reproduced that, with pyspark 2.1 also for hadoop 2.6 and with python 3.x I'll look into it a bit more after I've fixed a few other issues regarding the salting of strings on the cluster. 2017-01-30 20:19 GMT+01:00 Blaž Šnuderl : > I am loading a simple text file using pyspark. Repartitio

Re: Which streaming platform is best? Kafka or Spark Streaming?

2017-03-11 Thread Gaurav Pandya

Thank you very much guys. My question may sound little bit off but was somewhat confused so wanted to get some expert advice on this. I will take a look at the links mentioned in the replies. I really appreciate your suggestions. These are the kind of answers I needed to clear my doubts. Have a nic

Re: How to improve performance of saveAsTextFile()

2017-03-11 Thread Yan Facai

How about increasing RDD's partitions / rebalancing data? On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud wrote: > How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“). > This is taking over 30 minutes on a cluster of 10 nodes. > Running Spark on YARN. > > JavaRDD has 120 million e

Re: org.apache.spark.SparkException: Task not serializable

2017-03-11 Thread Yan Facai

For scala, make your class Serializable, like this ``` class YourClass *extends Serializable {}```* On Sat, Mar 11, 2017 at 3:51 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > hi mina, > > can you paste your new code here pleasel > i meet this issue too but do not get Ankur's idea. > > thanks > Robin

Re: java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext

2017-03-11 Thread ??????????

i think the val you defined are only valid in the driver, you can try boardcast variable. ---Original--- From: "lk_spark" Date: 2017/2/27 11:14:23 To: "user.spark"; Subject: java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext hi,all: I want to extract some

Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-11 Thread Dongjin Lee

Hello Chetan, Could you post some code? If I understood correctly, you are trying to save JSON like: { "first_name": "Dongjin", "last_name: null } not in omitted form, like: { "first_name": "Dongjin" } right? - Dongjin On Wed, Mar 8, 2017 at 5:58 AM, Chetan Khatri wrote: > Hello Dev

Re: Pyspark 2.1.0 weird behavior with repartition

Re: Which streaming platform is best? Kafka or Spark Streaming?

Re: How to improve performance of saveAsTextFile()

Re: org.apache.spark.SparkException: Task not serializable

Re: java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext

Re: Issues: Generate JSON with null values in Spark 2.0.x

6 matches

Site Navigation

Mail list logo

Footer information