date:20140320

Re: new Catalyst/SQL component merged into master

2014-03-20 Thread Heiko Braun

Congrats! That's a really impressive and useful addition to spark. I just recently discovered a similar feature in pandas and really enjoyed using it. Regards, Heiko > Am 21.03.2014 um 02:11 schrieb Reynold Xin : > > Hi All, > > I'm excited to announce a new module in Spark (SPARK-1251). A

Re: Spark AMI

2014-03-20 Thread Patrick Wendell

It has a bunch of packages installed on it for various spark dependencies (libfortran, numpy, scipy) and some helpful tools (dstat, iotop). On Thu, Mar 20, 2014 at 10:21 AM, Reynold Xin wrote: > It's mostly stock CentOS installation with some scripts. > > > > > On Thu, Mar 20, 2014 at 2:53 AM, Us

Re: Spark 0.9.1 release

2014-03-20 Thread Bhaskar Dutta

Thank You! We plan to test out 0.9.1 on YARN once it is out. Regards, Bhaskar On Fri, Mar 21, 2014 at 12:42 AM, Tom Graves wrote: > I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running > on YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as > submitting user - J

Please subscribe me

2014-03-20 Thread twinkle sachdeva

Please subscribe me.

Re: new Catalyst/SQL component merged into master

2014-03-20 Thread Michael Armbrust

Hi Everyone, I'm very excited about merging this new feature into Spark! We have a lot of cool things in the pipeline, including: porting Shark's in-memory columnar format to Spark SQL, code-generation for expression evaluation and improved support for complex types in parquet. I would love to h

new Catalyst/SQL component merged into master

2014-03-20 Thread Reynold Xin

Hi All, I'm excited to announce a new module in Spark (SPARK-1251). After an initial review we've merged this as Spark as an alpha component to be included in Spark 1.0. This new component adds some exciting features, including: - schema-aware RDD programming via an experimental DSL - native Parq

Re: Spark 0.9.1 release

2014-03-20 Thread Patrick Wendell

Thanks Tom, After I looked more at this patch I don't see how this could have regressed behavior for any users (it seems like it only pertains to warnings and instructions). So maybe the user mistook this patch for a different issue. https://github.com/apache/incubator-spark/pull/553/files - Pat

Re: Spark 0.9.1 release

2014-03-20 Thread Tom Graves

Thanks for the heads up, saw that and will make sure that is resolved before pulling into 0.9. Unless I'm missing something, they should just use sc.addJar to distributed the jar rather then relying on SPARK_YARN_APP_JAR. Tom On Thursday, March 20, 2014 3:31 PM, Patrick Wendell wrote: Hey

Re: Spark 0.9.1 release

2014-03-20 Thread Patrick Wendell

Hey Tom, > I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on > YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as submitting > user - JIRA in. The pyspark one I would consider more of an enhancement so > might not be appropriate for a point release. Some

Re: Spark 0.9.1 release

2014-03-20 Thread Tom Graves

I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA in. The pyspark one I would consider more of an enhancement so might not be appropriate for a point release. [SPARK-1053] Shoul

Re: Spark 0.9.1 release

2014-03-20 Thread Bhaskar Dutta

It will be great if "SPARK-1101: Umbrella for hardening Spark on YARN" can get into 0.9.1. Thanks, Bhaskar On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das wrote: > Hello everyone, > > Since the release of Spark 0.9, we have received a number

Re: Largest input data set observed for Spark.

2014-03-20 Thread Andrew Ash

Understood of course. Did the data fit comfortably in memory or did you experience memory pressure? I've had to do a fair amount of tuning when under memory pressure in the past (0.7.x) and was hoping that the handling of this scenario is improved in later Spark versions. On Thu, Mar 20, 2014 a

Re: Largest input data set observed for Spark.

2014-03-20 Thread Henry Saputra

Reynold, just curious did you guys ran it in AWS? - Henry On Thu, Mar 20, 2014 at 11:08 AM, Reynold Xin wrote: > Actually we just ran a job with 70TB+ compressed data on 28 worker nodes - > I didn't count the size of the uncompressed data, but I am guessing it is > somewhere between 200TB to 700

Re: Largest input data set observed for Spark.

2014-03-20 Thread Reynold Xin

I'm not really at liberty to discuss details of the job. It involves some expensive aggregated statistics, and took 10 hours to complete (mostly bottlenecked by network & io). On Thu, Mar 20, 2014 at 11:12 AM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > Reynold, > > How complex w

Re: Largest input data set observed for Spark.

2014-03-20 Thread Surendranauth Hiraman

Reynold, How complex was that job (I guess in terms of number of transforms and actions) and how long did that take to process? -Suren On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin wrote: > Actually we just ran a job with 70TB+ compressed data on 28 worker nodes - > I didn't count the size of

Re: Largest input data set observed for Spark.

2014-03-20 Thread Reynold Xin

Actually we just ran a job with 70TB+ compressed data on 28 worker nodes - I didn't count the size of the uncompressed data, but I am guessing it is somewhere between 200TB to 700TB. On Thu, Mar 20, 2014 at 12:23 AM, Usman Ghani wrote: > All, > What is the largest input data set y'all have com

Re: Spark AMI

2014-03-20 Thread Reynold Xin

It's mostly stock CentOS installation with some scripts. On Thu, Mar 20, 2014 at 2:53 AM, Usman Ghani wrote: > Is there anything special about the spark AMIs or are they just stock > CentOS installations? >

Spark AMI

2014-03-20 Thread Usman Ghani

Is there anything special about the spark AMIs or are they just stock CentOS installations?

Re: how to config worker HA

2014-03-20 Thread qingyang li

i think i found the answer: apply(flags: Int, replication: Int): StorageLevel 2014-03-20 17:00 GM

Re: how to config worker HA

2014-03-20 Thread qingyang li

can someone help me ? 2014-03-12 21:26 GMT+08:00 qingyang li : > in addition: > on this site: > https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#hadoop-datasets > , > i find RDD can be stored using a different *storage level on the web, > and *also find StorageLevel's attribute

Re: Announcing the official Spark Job Server repo

2014-03-20 Thread andy petrella

Heya, That's cool you've already hacked something for this in the scripts! I have a related question, how would it work actually. I mean, to have this Job Server fault tolerant using Marathon, I would guess that it will need to be itself a Mesos framework, and able to publish its resources needs.

[spark-streaming] Is this LocalInputDStream useful to someone?

2014-03-20 Thread Pascal Voitot Dev

Hi guys, In my recent blog post (http://mandubian.com/2014/03/08/zpark-ml-nio-1/), I needed to have an InputDStream helper looking like NetworkInputDStream to be able to push my data into DStream in an async way. But I didn't want the remoting aspect as my data source runs locally and nowhere else

Re:Largest input data set observed for Spark.

2014-03-20 Thread ligq

4 ratings -- Original -- From: "Usman Ghani";; Date: Thu, Mar 20, 2014 03:23 PM To: "user"; "dev"; Subject: Largest input data set observed for Spark. All, What is the largest input data set y'all have come across that has been successfully proce

Largest input data set observed for Spark.

2014-03-20 Thread Usman Ghani

All, What is the largest input data set y'all have come across that has been successfully processed in production using spark. Ball park?

Re: new Catalyst/SQL component merged into master

Re: Spark AMI

Re: Spark 0.9.1 release

Please subscribe me

Re: new Catalyst/SQL component merged into master

new Catalyst/SQL component merged into master

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Spark 0.9.1 release

Re: Largest input data set observed for Spark.

Re: Largest input data set observed for Spark.

Re: Largest input data set observed for Spark.

Re: Largest input data set observed for Spark.

Re: Largest input data set observed for Spark.

Re: Spark AMI

Spark AMI

Re: how to config worker HA

Re: how to config worker HA

Re: Announcing the official Spark Job Server repo

[spark-streaming] Is this LocalInputDStream useful to someone?

Re:Largest input data set observed for Spark.

Largest input data set observed for Spark.

24 matches

Site Navigation

Mail list logo

Footer information