Re: spark is running extremely slow with larger data set, like 2G

2014-10-24 Thread Davies Liu
re memory during shuffle (like groupBy()), which will increase the performance. > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-is-running-extremely-slow-with-larger-data-set-like-2G-tp17152p17231.html > Sent from t

Re: spark is running extremely slow with larger data set, like 2G

2014-10-24 Thread xuhongnever
tween them? *spark.python.worker.memory spark.executor.memory spark.driver.memory* -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-is-running-extremely-slow-with-larger-data-set-like-2G-tp17152p17231.html Sent from the Apache Spark User List mailing li

Re: spark is running extremely slow with larger data set, like 2G

2014-10-24 Thread Davies Liu
uot;\t" + line[1]) > records.saveAsTextFile("file:///home/xzhang/data/result") > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-is-running-extremely-slow-with-larger-data-set-like-2G-tp17152p17153.htm

Re: spark is running extremely slow with larger data set, like 2G

2014-10-24 Thread Akhil Das
t; #records = records.sortByKey() > records = records.map(lambda line: line[0] + "\t" + line[1]) > records.saveAsTextFile("file:///home/xzhang/data/result") > > > > -- > View this message in context: > http://apache-spark

Re: spark is running extremely slow with larger data set, like 2G

2014-10-23 Thread xuhongnever
ot;\t" + b ) #print(records.count()) #records = records.sortByKey() records = records.map(lambda line: line[0] + "\t" + line[1]) records.saveAsTextFile("file:///home/xzhang/data/result") -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spa

spark is running extremely slow with larger data set, like 2G

2014-10-23 Thread xuhongnever
task won't finish in hours I tried the input both from NFS and HDFS <http://apache-spark-user-list.1001560.n3.nabble.com/file/n17152/48.png> What's could be the problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-is-running-ext