Hi Davis,
When I run your code in pyspark, I still get the same error:
>>> sc.parallelize(range(10)).map(lambda x: (x, str(x))).sortByKey().count()
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'PipelinedRDD' object has no attribute 'sortByKey'
Is it the matter with
Hi Davis,
Thank you for you answer. This is my code. I think it is very similar with
word count example in spark
lines = sc.textFile(sys.argv[2])
sie = lines.map(lambda l: (l.strip().split(',')[4],1)).reduceByKey(lambda
a, b: a + b)
sort_sie = sie.sortByKey(False)
Thanks again.
--
View
Hi,
I am a freshman about spark. I tried to run a job like wordcount example in
python. But when I tried to get the top 10 popular words in the file, I got
the message:AttributeError: 'PipelinedRDD' object has no attribute
'sortByKey'.
So my question is what is the difference between PipelinedRDD