date:20150122

Spark performance gains for small queries

2015-01-22 Thread Saumitra Shahapure (Vizury)

Hello, We were comparing performance of some of our production hive queries between Hive and Spark. We compared Hive(0.13)+hadoop (1.2.1) against both Spark 0.9 and 1.1. We could see that the performance gains have been good in Spark. We tried a very simple query, select count(*) from T where col

Re: spark 1.1.0 (w/ hadoop 2.4) vs aws java sdk 1.7.2

2015-01-22 Thread William-Smith

I have had the same issue while using HttpClient from AWS EMR Spark Streaming to post to a nodejs server. I have found ... using Classloder.getResource('org/apache/http/client/HttpClient") that the class Is being loaded front the spark-assembly-1.1.0-hadoop2.4.0.jar. That in itself is not t

Re: query planner design doc?

2015-01-22 Thread Michael Armbrust

Here is the initial design document for catalyst : https://docs.google.com/document/d/1Hc_Ehtr0G8SQUg69cmViZsMi55_Kf3tISD9GPGU5M1Y/edit Strategies (many of which are in SparkStragegies.scala) are the part that creates the physical operators from a catalyst logical plan. These operators have execu

query planner design doc?

2015-01-22 Thread Nicholas Murphy

Hi- Quick question: is there a design doc (or something more than “look at the code”) for the query planner for Spark SQL (i.e., the component that takes…Catalyst?…operator trees and translates them into SPARK operations)? Thanks, Nick ---

Re: Are there any plans to run Spark on top of Succinct

2015-01-22 Thread Dean Wampler

Interesting. I was wondering recently if anyone has explored working with compressed data directly. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe @deanwampler

Re: KNN for large data set

2015-01-22 Thread DEVAN M.S.

Thanks Xiangrui Meng will try this. And, found this https://github.com/kaushikranjan/knnJoin also. Will this work with double data ? Can we find out z value of *Vector(10.3,4.5,3,5)* ? On Thu, Jan 22, 2015 at 12:25 AM, Xiangrui Meng wrote: > For large datasets, you need hashing in order to

Are there any plans to run Spark on top of Succinct

2015-01-22 Thread Mick Davies

http://succinct.cs.berkeley.edu/wp/wordpress/ Looks like a really interesting piece of work that could dovetail well with Spark. I have been trying recently to optimize some queries I have running on Spark on top of Parquet but the support from Parquet for predicate push down especially for dict

Spark performance gains for small queries

Re: spark 1.1.0 (w/ hadoop 2.4) vs aws java sdk 1.7.2

Re: query planner design doc?

query planner design doc?

Re: Are there any plans to run Spark on top of Succinct

Re: KNN for large data set

Are there any plans to run Spark on top of Succinct

7 matches

Site Navigation

Mail list logo

Footer information