Is SparkSQL optimizer aware of the needed data after the query?

2015-03-02 Thread Wail
Dears, I'm just curious about the complexity of the query optimizer. Can the optimizer evaluates what after the SQL? maybe it's a stupid question ,, but here is an example to show the case: >From the Spark SQL example: val teenagers = sqlContext.sql("SELECT * FROM people WHERE age >= 13 AND age <

Re: How to create a Row from a List or Array in Spark using Scala

2015-03-02 Thread Dirceu Semighini Filho
You can use the parallelize method: val data = List( Row(1, 5, "vlr1", 10.5), Row(2, 1, "vl3", 0.1), Row(3, 8, "vl3", 10.0), Row(4, 1, "vl4", 1.0)) val rdd = sc.parallelize(data) Here I'm using a list of Rows, but you could use it with a list of other kind of object, like this: val x =

Re: spark-ec2 default to Hadoop 2

2015-03-02 Thread Shivaram Venkataraman
FWIW there is a PR open to add support for Hadoop 2.4 to spark-ec2 scripts at https://github.com/mesos/spark-ec2/pull/77 -- But it hasnt' received much review or testing to be merged. Thanks Shivaram On Sun, Mar 1, 2015 at 11:49 PM, Sean Owen wrote: > I agree with that. My anecdotal impression

Re: Using CUDA within Spark / boosting linear algebra

2015-03-02 Thread Xiangrui Meng
On Fri, Feb 27, 2015 at 12:33 PM, Sam Halliday wrote: > Also, check the JNILoader output. > > Remember, for netlib-java to use your system libblas all you need to do is > setup libblas.so.3 like any native application would expect. > > I haven't ever used the cublas "real BLAS" implementation, so

RE: Using CUDA within Spark / boosting linear algebra

2015-03-02 Thread Ulanov, Alexander
Hi Xiangrui, Thanks for the link, I am currently trying to use nvblas. It seems that netlib wrappers are implemented with C-BLAS interface and nvblas does not have c-blas. I wonder how it is going to work. I'll keep you updated. Alexander -Original Message- From: Xiangrui Meng [mailto:

Re: spark-ec2 default to Hadoop 2

2015-03-02 Thread Nicholas Chammas
I might take a look at that pr if we get around to doing some perf testing of Spark on various resource managers. 2015년 3월 2일 (월) 오후 12:22, Shivaram Venkataraman 님이 작성: FWIW there is a PR open to add support for Hadoop 2.4 to spark-ec2 scripts > at https://github.com/mesos/spark-ec2/pull/77 -- Bu

PSA: Link to files at fixed version

2015-03-02 Thread Nicholas Chammas
*TL;DR*: Hit y on any file page on GitHub to update the URL to a permanent link. Many of you probably already know this. Here’s a handy tip for the rest. So you’re on Github and you want to link to a file in an email, PR, or JIRA report. Or better yet, you want to link to some specific lines in a

RE: Using CUDA within Spark / boosting linear algebra

2015-03-02 Thread Ulanov, Alexander
Thanks Sam for suggestion! I should try doing this. Now I suppose that netlib-java linked with cuBlas during the execution time does fall back to cblas library in my system, which is atlas. If I remove atlas, netlib (linked with cublas) fails with the message "undefined symbol: cblas_dgemm".