Hi Csaba, It sounds like the API you are looking for is sc.wholeTextFiles :)
Cheers, Holden :) On Tuesday, October 28, 2014, Csaba Ragany <rag...@gmail.com> wrote: > Dear Spark Community, > > Is it possible to convert text files (.log or .txt files) into > sequencefiles in Python? > > Using PySpark I can create a parallelized file with > rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile > with rdd.saveAsSequenceFile(). But how can I put the whole content of my > text files into the 'value' of 'key1' ? > > I want a sequencefile where the keys are the filenames of the text files > and the values are their content. > > Thank you for any help! > Csaba > -- Cell : 425-233-8271