Re: GPU Acceleration of Spark Logistic Regression and Other MLlib libraries

2016-01-22 Thread John Canny
Hi Rajesh, FYI, we are developing our own version of BIDMach integration with Spark, and achieving large gains over Spark MLLib for both CPU and GPU computation. You can find the project here: https://github.com/BIDData/BIDMach_Spark I'm not sure I follow your comment "However, I think

Re: GPU Acceleration of Spark Logistic Regression and Other MLlib libraries

2016-01-22 Thread Sam Halliday
Hi all, (I'm author of netlib-java) Interesting to see this discussion come to life again. JNI is quite limiting: pinning (or critical array access) essentially disables the GC for the whole JVM for the duration of the native call. I can justify this for CPU heavy tasks because frankly there

RE: Using CUDA within Spark / boosting linear algebra

2016-01-22 Thread Kazuaki Ishizaki
Hi Allen, Thank you for your feedback. An API to launch GPU kernels with JCuda is the our first step. A purpose to release our prototype is to get feedback. In the future, we may use other wrappers instead of JCuda. We are very appreciate it if you would suggest or propose APIs to effectively

Re: Spark SQL: Avoid shuffles when data is already partitioned on disk

2016-01-22 Thread Takeshi Yamamuro
My bad, thanks. On Fri, Jan 22, 2016 at 4:34 PM, Reynold Xin wrote: > The original email was asking about data partitioning (Hive style) for > files, not in memory caching. > > > On Thursday, January 21, 2016, Takeshi Yamamuro > wrote: > >> You mean

RE: Using CUDA within Spark / boosting linear algebra

2016-01-22 Thread Kazuaki Ishizaki
Hi Alexander, The goal of our columnar to effectively drive GPUs in Spark. One of important items is to effectively and easily enable highly-tuned libraries for GPU such as BIDMach. We will enable BIDMach with our columnar storage. On the other hand, it is not easy task to scaling BIDMach with

Using distinct count in over clause

2016-01-22 Thread 汪洋
Hi, Do we support distinct count in the over clause in spark sql? I ran a sql like this: select a, count(distinct b) over ( order by a rows between unbounded preceding and current row) from table limit 10 Currently, it return an error says: expression ‘a' is neither present in the group by,

Re: Using distinct count in over clause

2016-01-22 Thread 汪洋
I think it cannot be right. > 在 2016年1月22日,下午4:53,汪洋 写道: > > Hi, > > Do we support distinct count in the over clause in spark sql? > > I ran a sql like this: > > select a, count(distinct b) over ( order by a rows between unbounded > preceding and current row) from

Re: Spark 1.6.1

2016-01-22 Thread BrandonBradley
I'd like more complete Postgres JDBC support for ArrayType before the next release. Some of them are still broken in 1.6.0. It would save me much time. Please see SPARK-12747 @ https://issues.apache.org/jira/browse/SPARK-12747 Cheers! Brandon Bradley -- View this message in context: