Call wholeTextFiles to read gzip files

2016-02-16 Thread Deepak Gopalakrishnan
Hello, I'm reading S3 files using wholeTextFiles() . My files are gzip format but the names of the files does not end with a ".gz". I cannot force the names of these files to end with a ".gz" . Is there a way to specify the InputFormat as Gzip when using wholeTextFiles() ? -- Regards, *Deepak Go

Re: Call wholeTextFiles to read gzip files

2016-02-16 Thread Ted Yu
Have you seen this thread ? http://stackoverflow.com/questions/24402737/how-to-read-gz-files-in-spark-using-wholetextfiles On Tue, Feb 16, 2016 at 2:17 AM, Deepak Gopalakrishnan wrote: > Hello, > > I'm reading S3 files using wholeTextFiles() . My files are gzip format but > the names of the fil

Re: SPARK_WORKER_MEMORY in Spark Standalone - conf.getenv vs System.getenv?

2016-02-16 Thread Igor Costa
Actually answering the first question: Is there a reason to use conf to read SPARK_WORKER_MEMORY not System.getenv as for the other env vars? You can use the properties file to change the amount, System.getenv would be bad when you have for example other things running on the JVM which will cause

Re: Welcoming two new committers

2016-02-16 Thread Igor Costa
Congratulations Herman and Wenchen. On Tue, Feb 9, 2016 at 10:58 AM, Joseph Bradley wrote: > Congrats & welcome! > > On Mon, Feb 8, 2016 at 12:19 PM, Ram Sriharsha > wrote: > >> great job guys! congrats and welcome! >> >> On Mon, Feb 8, 2016 at 12:05 PM, Amit Chavan wrote: >> >>> Welcome. >>>

Re: Welcoming two new committers

2016-02-16 Thread Raffael Bottoli Schemmer
Congratulations Herman and Wenchen, 2016-02-16 20:45 GMT-02:00 Igor Costa : > Congratulations Herman and Wenchen. > > On Tue, Feb 9, 2016 at 10:58 AM, Joseph Bradley > wrote: > >> Congrats & welcome! >> >> On Mon, Feb 8, 2016 at 12:19 PM, Ram Sriharsha >> wrote: >> >>> great job guys! congrats

DataFrame API and Ordering

2016-02-16 Thread Maciej Szymkiewicz
I am not sure if I've missed something obvious but as far as I can tell DataFrame API doesn't provide a clearly defined ordering rules excluding NaN handling. Methods like DataFrame.sort or sql.functions like min / max provide only general description. Discrepancy between functions.max (min) and Gr