Problem embedding GaussianMixtureModel in a closure

2015-12-31 Thread Tomasz Fruboes
Dear All, I'm trying to implement a procedure that iteratively updates a rdd using results from GaussianMixtureModel.predictSoft. In order to avoid problems with local variable (the obtained GMM) beeing overwritten in each pass of the loop I'm doing the following: ##

Re: Problem embedding GaussianMixtureModel in a closure

2016-01-04 Thread Tomasz Fruboes
Tomasz Fruboes mailto:tomasz.frub...@ncbj.gov.pl>>: Dear All, I'm trying to implement a procedure that iteratively updates a rdd using results from GaussianMixtureModel.predictSoft. In order to avoid problems with local variable (the obtained GMM) beeing overwritten i

Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-19 Thread Tomasz Fruboes
Dear Experts, we have a spark cluster (standalone mode) in which master and workers are started from root account. Everything runs correctly to the point when we try doing operations such as dataFrame.select("name", "age").save(ofile, "parquet") or rdd.saveAsPickleFile(ofile) , wh

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-20 Thread Tomasz Fruboes
r-x access rights Cheers, Tomasz W dniu 19.05.2015 o 23:56, Davies Liu pisze: It surprises me, could you list the owner information of /mnt/lustre/bigdata/med_home/tmp/test19EE/ ? On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes wrote: Dear Experts, we have a spark cluster (standalone mode

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-20 Thread Tomasz Fruboes
orkers are running. I couldn't find many references to this variable, but at least Yarn and Mesos take it into account when spawning executors. Chances are that standalone mode also does it. iulian On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes mailto:tomasz.frub...@fuw.edu.pl>> wrote:

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-21 Thread Tomasz Fruboes
5 o 23:08, Davies Liu pisze: Could you file a JIRA for this? The executor should run under the user who submit a job, I think. On Wed, May 20, 2015 at 2:40 AM, Tomasz Fruboes wrote: Thanks for a suggestion. I have tried playing with it, sc.sparkUser() gives me expected user name, but it doe

Re: saveAsTextFile() part- files are missing

2015-05-21 Thread Tomasz Fruboes
Hi, it looks you are writing to a local filesystem. Could you try writing to a location visible by all nodes (master and workers), e.g. nfs share? HTH, Tomasz W dniu 21.05.2015 o 17:16, rroxanaioana pisze: Hello! I just started with Spark. I have an application which counts words in a fi

Re: Union of many RDDs taking a long time

2015-06-29 Thread Tomasz Fruboes
Hi Matt, is there a reason you need to call coalesce every loop iteration? Most likely it forces spark to do lots of unnecessary shuffles. Also - for really large number of inputs this approach can lead to due to to many nested RDD.union calls. A safer approach is to call union from SparkCon