Dear All,
I'm trying to implement a procedure that iteratively updates a rdd
using results from GaussianMixtureModel.predictSoft. In order to avoid
problems with local variable (the obtained GMM) beeing overwritten in
each pass of the loop I'm doing the following:
##
Tomasz Fruboes mailto:tomasz.frub...@ncbj.gov.pl>>:
Dear All,
I'm trying to implement a procedure that iteratively updates a rdd
using results from GaussianMixtureModel.predictSoft. In order to
avoid problems with local variable (the obtained GMM) beeing
overwritten i
Dear Experts,
we have a spark cluster (standalone mode) in which master and workers
are started from root account. Everything runs correctly to the point
when we try doing operations such as
dataFrame.select("name", "age").save(ofile, "parquet")
or
rdd.saveAsPickleFile(ofile)
, wh
r-x access rights
Cheers,
Tomasz
W dniu 19.05.2015 o 23:56, Davies Liu pisze:
It surprises me, could you list the owner information of
/mnt/lustre/bigdata/med_home/tmp/test19EE/ ?
On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes
wrote:
Dear Experts,
we have a spark cluster (standalone mode
orkers
are running. I couldn't find many references to this variable, but at
least Yarn and Mesos take it into account when spawning executors.
Chances are that standalone mode also does it.
iulian
On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes
mailto:tomasz.frub...@fuw.edu.pl>> wrote:
5 o 23:08, Davies Liu pisze:
Could you file a JIRA for this?
The executor should run under the user who submit a job, I think.
On Wed, May 20, 2015 at 2:40 AM, Tomasz Fruboes
wrote:
Thanks for a suggestion. I have tried playing with it, sc.sparkUser() gives
me expected user name, but it doe
Hi,
it looks you are writing to a local filesystem. Could you try writing
to a location visible by all nodes (master and workers), e.g. nfs share?
HTH,
Tomasz
W dniu 21.05.2015 o 17:16, rroxanaioana pisze:
Hello!
I just started with Spark. I have an application which counts words in a
fi
Hi Matt,
is there a reason you need to call coalesce every loop iteration? Most
likely it forces spark to do lots of unnecessary shuffles. Also - for
really large number of inputs this approach can lead to due to to many
nested RDD.union calls. A safer approach is to call union from
SparkCon