date:20141101

Changes to Spark's networking subsystem

2014-11-01 Thread Patrick Wendell

== Short version == A recent commit replaces Spark's networking subsystem with one based on Netty rather than raw sockets. Users running off of master can disable this change by setting "spark.shuffle.blockTransferService=nio". We will be testing with this during the QA period for Spark 1.2. The ne

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Kay Ousterhout

Hi Nick, No -- we're doing a much more constrained thing of just trying to get things set up to easily run TPC-DS on SparkSQL (which involves generating the data, storing it in HDFS, getting all the queries in the right format, etc.). Cloudera does have a repo here: https://github.com/cloudera/imp

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas

Kay, Is this effort related to the existing AMPLab Big Data benchmark that covers Spark, Redshift, Tez, and Impala? Nick 2014년 10월 31일 금요일, Kay Ousterhout님이 작성한 메시지: > There's been an effort in the AMPLab at Berkeley to set up a shared > codebase that makes it easy to run TPC-DS on SparkSQL, s

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas

Good points raised. Some comments. Re: #1 It seems like there is a misunderstanding of the purpose of the Daytona Gray benchmark. The purpose of the benchmark is to see how fast you can sort 100 TB of data (technically, your sort rate during the operation) using *any* hardware or software config,

Re: Surprising Spark SQL benchmark

2014-11-01 Thread RJ Nowling

Two thoughts here: 1. The real flaw with the sort benchmark was that Hadoop wasn't run on the same hardware. Given the advances in networking (availabIlity of 10GB Ethernet) and disks (SSDs) since the Hadoop benchmarks it was compared to, it's an apples to oranges comparison. Without that, it does

Re: Surprising Spark SQL benchmark

2014-11-01 Thread arthur.hk.c...@gmail.com

Hi Key, Thank you so much for your update!! Look forward to the shared code from AMPLab. As a member of the Spark community, I really hope that I could help to run TPC-DS on SparkSQL. At the moment, I am trying TPC-H 22 queries on SparkSQL 1.1.0 +Hive 0.12, and Hive 0.13.1 respectively (waiti

deterministic D-Stream mode

2014-11-01 Thread forough

Hi all, I have 2 questions: 1. I found out the meaning of "deterministic" in D-Stream model is : given a particular input, will always produce the same output that is necessary for recomputation in lineage. Is it right? 2. Why spark stream needs to replicate input data for making deterministic

Changes to Spark's networking subsystem

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

Re: Surprising Spark SQL benchmark

deterministic D-Stream mode

7 matches

Site Navigation

Mail list logo

Footer information