Re: pyspark and hdfs file name

2014-11-14 Thread Oleg Ruchovets
Hi Devies. Thank you for the quick answer. I have a code like this: sc = SparkContext(appName=TAD) lines = sc.textFile(sys.argv[1], 1) result = lines.map(doSplit).groupByKey().map(lambda (k,vc): traffic_process_model(k,vc)) result.saveAsTextFile(sys.argv[2]) Can you please give short

pyspark and hdfs file name

2014-11-13 Thread Oleg Ruchovets
Hi , I am running pyspark job. I need serialize final result to *hdfs in binary files* and having ability to give a *name for output files*. I found this post: http://stackoverflow.com/questions/25293962/specifying-the-output-file-name-in-apache-spark but it explains how to do it using scala.

Re: pyspark and hdfs file name

2014-11-13 Thread Davies Liu
One option maybe call HDFS tools or client to rename them after saveAsXXXFile(). On Thu, Nov 13, 2014 at 9:39 PM, Oleg Ruchovets oruchov...@gmail.com wrote: Hi , I am running pyspark job. I need serialize final result to hdfs in binary files and having ability to give a name for output