Re: Question about SPARK-11374 (skip.header.line.count)

2016-12-10 Thread Mingjie Tang
+1, it is useful. On Sat, Dec 10, 2016 at 9:28 PM, Dongjoon Hyun wrote: > Thank you for the opinion, Felix. > > Bests, > Dongjoon. > > On Sat, Dec 10, 2016 at 11:00 AM, Felix Cheung > wrote: > >> +1 I think it's useful to always have a pure SQL

Re: Question about SPARK-11374 (skip.header.line.count)

2016-12-10 Thread Dongjoon Hyun
Thank you for the opinion, Felix. Bests, Dongjoon. On Sat, Dec 10, 2016 at 11:00 AM, Felix Cheung wrote: > +1 I think it's useful to always have a pure SQL way and skip header for > plain text / csv that lots of companies have. > > > -- >

Re: Question about SPARK-11374 (skip.header.line.count)

2016-12-10 Thread Felix Cheung
+1 I think it's useful to always have a pure SQL way and skip header for plain text / csv that lots of companies have. From: Dongjoon Hyun Sent: Friday, December 9, 2016 9:42:58 AM To: Dongjin Lee; dev@spark.apache.org Subject: Re: Question

Re: Document Similarity -Spark Mllib

2016-12-10 Thread Liang-Chi Hsieh
Hi Satyajit, I am not sure why you think DIMSUM cannot apply for your use case. Or you've tried it but encountered some problems. Although in the paper[1] the authors mentioned they concentrate on the regime where the number of rows is very large, and the number of columns is not too large. But