Re: Should spark-ec2 get its own repo?

2015-08-02 Thread Nicholas Chammas
On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote: I am considering porting some of this to a more general spark-cloud launcher, including google/aliyun/rackspace. It shouldn't be hard at all given the current approach for setup/install. FWIW, there are already some tools

What are 'Buckets' referred in Spark Core code

2015-08-02 Thread Haseeb
Hi all , I am neebie trying to understand spark internals. There some entity referred to as 'buckets' at many places in Spark Core code but I am having a hard time what it is as it is just mentioned in code comments but I didn't come across any data structure that reffered to it or any class for

Re: What are 'Buckets' referred in Spark Core code

2015-08-02 Thread cheez
Do we have a data structure that corresponds to buckets in Shuffle ? That is of we wanted to explore the 'content' of these buckets in shuffle phase, can we do that ? If yes, how ? -- View this message in context:

Re: What are 'Buckets' referred in Spark Core code

2015-08-02 Thread Reynold Xin
There are two usage of buckets used in Spark core. The first usage is in histogram, used to perform sorting. Basically we build an approximate histogram of the data in order to decide how to partition the data in sorting. Each bucket is a range in the histogram. The 2nd is used in shuffle, where

Master JIRA ticket for tracking Spark 1.5.0 configuration renames, defaults changes, and configuration deprecation

2015-08-02 Thread Josh Rosen
To help us track planned / finished configuration renames, defaults changes, and configuration deprecation for the upcoming 1.5.0 release, I have created https://issues.apache.org/jira/browse/SPARK-9550. As you make configuration changes or think of configurations that need to be audited, please

Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-08-02 Thread james
Thank you for your reply! Do you mean that currently if i want to use this Tungsten feature, we had to set sort shuffle manager(spark.shuffle.manager=sort) ,right ? However, I saw a slide Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal published in Spark Summit 2015 and it