date:20151208

I filed SPARK-12233

2015-12-08 Thread Fengdong Yu

Hi, I filed an issue, please take a look: https://issues.apache.org/jira/browse/SPARK-12233 It definitely can be reproduced.

Re: Failed to generate predicate Error when using dropna

2015-12-08 Thread Chang Ya-Hsuan

https://issues.apache.org/jira/browse/SPARK-12231 this is my first time to create JIRA ticket. is this ticket proper? thanks On Tue, Dec 8, 2015 at 9:59 PM, Reynold Xin wrote: > Can you create a JIRA ticket for this? Thanks. > > > On Tue, Dec 8, 2015 at 5:25 PM, Chang Ya-Hsuan wrote: > >> spar

Re: A proposal for Spark 2.0

2015-12-08 Thread Kostas Sakellis

I'd also like to make it a requirement that Spark 2.0 have a stable dataframe and dataset API - we should not leave these APIs experimental in the 2.0 release. We already know of at least one breaking change we need to make to dataframes, now's the time to make any other changes we need to stabiliz

RE: Data and Model Parallelism in MLPC

2015-12-08 Thread Ulanov, Alexander

Hi Disha, Which use case do you have in mind that would require model parallelism? It should have large number of weights, so it could not fit into the memory of a single machine. For example, multilayer perceptron topologies, that are used for speech recognition, have up to 100M of weights. Pr

Re: Data and Model Parallelism in MLPC

2015-12-08 Thread Disha Shrivastava

Hi Alexander, Thanks for your response. Can you suggest ways to incorporate Model Parallelism in MPLC? I am trying to do the same in Spark. I got hold of your post http://apache-spark-developers-list.1001551.n3.nabble.com/Model-parallelism-with-RDD-td13141.html where you have divided the weight ma

RE: Data and Model Parallelism in MLPC

2015-12-08 Thread Ulanov, Alexander

Hi Disha, Multilayer perceptron classifier in Spark implements data parallelism. Best regards, Alexander From: Disha Shrivastava [mailto:dishu@gmail.com] Sent: Tuesday, December 08, 2015 12:43 AM To: dev@spark.apache.org; Ulanov, Alexander Subject: Data and Model Parallelism in MLPC Hi, I w

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Nicholas Chammas

Interesting. As long as Spark's dependencies don't change that often, the same caches could save "from scratch" build time over many months of Spark development. Is that right? On Tue, Dec 8, 2015 at 12:33 PM Josh Rosen wrote: > @Nick, on a fresh EC2 instance a significant chunk of the initial b

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-08 Thread Michael Armbrust

An update: the vote fails due to the -1. I'll post another RC as soon as we've resolved these issues. In the mean time I encourage people to continue testing and post any problems they encounter here. On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai wrote: > -1 > > Tow blocker bugs have been found af

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Stephen Boesch

I will echo Steve L's comment about having zinc running (with --nailed). That provides at least a 2X speedup - sometimes without it spark simply does not build for me. 2015-12-08 9:33 GMT-08:00 Josh Rosen : > @Nick, on a fresh EC2 instance a significant chunk of the initial build > time might be

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Josh Rosen

@Nick, on a fresh EC2 instance a significant chunk of the initial build time might be due to artifact resolution + downloading. Putting pre-populated Ivy and Maven caches onto your EC2 machine could shave a decent chunk of time off that first build. On Tue, Dec 8, 2015 at 9:16 AM, Nicholas Chammas

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Nicholas Chammas

Thanks for the tips, Jakob and Steve. It looks like my original approach is the best for me since I'm installing Spark on newly launched EC2 instances and can't take advantage of incremental compilation. Nick On Tue, Dec 8, 2015 at 7:01 AM Steve Loughran wrote: > On 7 Dec 2015, at 19:07, Jakob

Filte the null before InnerJoin to solve the problem of data skew

2015-12-08 Thread vector

when i join two tables, i find a table has the problem of data skew, and the skewing value of the field is null. so i want to filte the null before InnerJoin. like that a.key is skewed and the skewing value is null Change "select * from a join b on a.key = b.key" to "select * from a jo

Re: Failed to generate predicate Error when using dropna

2015-12-08 Thread Reynold Xin

Can you create a JIRA ticket for this? Thanks. On Tue, Dec 8, 2015 at 5:25 PM, Chang Ya-Hsuan wrote: > spark version: spark-1.5.2-bin-hadoop2.6 > python version: 2.7.9 > os: ubuntu 14.04 > > code to reproduce error > > # write.py > > import pyspark > sc = pyspark.SparkContext() > sqlc = pyspark

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Steve Loughran

On 7 Dec 2015, at 19:07, Jakob Odersky mailto:joder...@gmail.com>> wrote: make-distribution and the second code snippet both create a distribution from a clean state. They therefore require that every source file be compiled and that takes time (you can maybe tweak some settings or use a newer

Failed to generate predicate Error when using dropna

2015-12-08 Thread Chang Ya-Hsuan

spark version: spark-1.5.2-bin-hadoop2.6 python version: 2.7.9 os: ubuntu 14.04 code to reproduce error # write.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df = sqlc.range(10) df1 = df.withColumn('a', df['id'] * 2) df1.write.partitionBy('id').parquet('./data') #

回复: mlib compilation errors

2015-12-08 Thread wei....@kaiyuandao.com

probably it is because I ran "./dev/change-scala-version.sh 2.11" after importing these projects in intellij. I reimported these projects later. it works fine. closed for this thread. thanks 发件人： wei@kaiyuandao.com 发送时间： 2015-12-07 16:43 收件人： dev 主题： mlib compilation errors hi， when I

Data and Model Parallelism in MLPC

2015-12-08 Thread Disha Shrivastava

Hi, I would like to know if the implementation of MLPC in the latest released version of Spark ( 1.5.2 ) implements model parallelism and data parallelism as done in the DistBelief model implemented by Google http://static.googleusercontent.com/media/research.google.com/hi//archive/large_deep_netw

I filed SPARK-12233

Re: Failed to generate predicate Error when using dropna

Re: A proposal for Spark 2.0

RE: Data and Model Parallelism in MLPC

Re: Data and Model Parallelism in MLPC

RE: Data and Model Parallelism in MLPC

Re: Fastest way to build Spark from scratch

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Re: Fastest way to build Spark from scratch

Re: Fastest way to build Spark from scratch

Re: Fastest way to build Spark from scratch

Filte the null before InnerJoin to solve the problem of data skew

Re: Failed to generate predicate Error when using dropna

Re: Fastest way to build Spark from scratch

Failed to generate predicate Error when using dropna

回复: mlib compilation errors

Data and Model Parallelism in MLPC

17 matches

Site Navigation

Mail list logo

Footer information