I kinda reproduced that, with pyspark 2.1 also for hadoop 2.6 and with
python 3.x
I'll look into it a bit more after I've fixed a few other issues regarding
the salting of strings on the cluster.
2017-01-30 20:19 GMT+01:00 Blaž Šnuderl :
> I am loading a simple text file using pyspark. Repartitio
Thank you very much guys. My question may sound little bit off but was
somewhat confused so wanted to get some expert advice on this. I will take
a look at the links mentioned in the replies. I really appreciate your
suggestions. These are the kind of answers I needed to clear my doubts.
Have a nic
How about increasing RDD's partitions / rebalancing data?
On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud
wrote:
> How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“).
> This is taking over 30 minutes on a cluster of 10 nodes.
> Running Spark on YARN.
>
> JavaRDD has 120 million e
For scala,
make your class Serializable, like this
```
class YourClass
*extends Serializable {}```*
On Sat, Mar 11, 2017 at 3:51 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote:
> hi mina,
>
> can you paste your new code here pleasel
> i meet this issue too but do not get Ankur's idea.
>
> thanks
> Robin
i think the val you defined are only valid in the driver, you can try
boardcast variable.
---Original---
From: "lk_spark"
Date: 2017/2/27 11:14:23
To: "user.spark";
Subject: java.io.NotSerializableException:
org.apache.spark.streaming.StreamingContext
hi,all:
I want to extract some
Hello Chetan,
Could you post some code? If I understood correctly, you are trying to save
JSON like:
{
"first_name": "Dongjin",
"last_name: null
}
not in omitted form, like:
{
"first_name": "Dongjin"
}
right?
- Dongjin
On Wed, Mar 8, 2017 at 5:58 AM, Chetan Khatri
wrote:
> Hello Dev