I run "spark-sql --master yarn --deploy-mode client -f 'SQLs' " in shell,
The application is stuck when the AM is down and restart in other nodes. It
seems the driver wait for the next sql. Is this a bug?In my opinion,Either
the application execute the failed sql or exit with a failure when the
I run "spark-sql --master yarn --deploy-mode client -f 'SQLs' " in shell,
The application is stuck when the AM is down and restart in other nodes. It
seems the driver wait for the next sql. Is this a bug?In my opinion,Either
the application execute the failed sql or exit with a failure when the
Hi all,
We are getting the following exception and this somehow blocks the parent
thread from proceeding further.
17/11/14 16:50:09 SPARK_APP WARN NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
17/11/14 16:50:17 SPARK_APP WA
Hi,
Thank You All..
Here is my requirement, I have a dataframe which contains list of rows
retrieved from oracle table.
I need to iterate dataframe and fetch each record and call a common
function by passing few parameters.
Issue I am facing is : I am not able to call common function
JavaRDD pe
In my practice of spark application(almost Spark-SQL) , when there is a
complete node failure in my cluster, jobs which have shuffle blocks on the
node will completely fail after 4 task retries. It seems that data lineage
didn't work. What' more, our applications use multiple SQL statements for
da
Hi ,All. i am confused of how can i keep a sparkContext alive. Just in
the situation that we write a sql query on a web and backend we init a
sparkContext then submit the spark jobs. However the question is everytime
we run the query string,spark with request the resources from yarn.It is
pai
Ah, yes, I missed that part
it's `spark.local.dir`
spark.local.dir /tmp Directory to use for "scratch" space in Spark,
including map output files and RDDs that get stored on disk. This should be
on a fast, local disk in your system. It can also be a comma-separated list
of multiple directories on
I do think that there is an option to set the temporary shuffle location to
a particular directory. While working with EMR I set it to /mnt1/. Let me
know in case you are not able to find it.
On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob wrote:
> This code generates files under /tmp...blockmgr...
Hi All,
please help me with this error
17/12/20 11:07:16 INFO executor.CoarseGrainedExecutorBackend: Started
daemon with process name: 19581@ddh-dev-dataproc-sw-hdgx
17/12/20 11:07:16 INFO util.SignalUtils: Registered signal handler for TERM
17/12/20 11:07:16 INFO util.SignalUtils: Registered sig
Probability of a complete node failure is low. I would rely on data lineage
and accept the reprocessing overhead. Another option would be to Write on
distributed FS but it will drastically reduce all your jobs speed
Le 20 déc. 2017 11:23, "chopinxb" a écrit :
> Yes,shuffle service was already st
Yes,shuffle service was already started in each NodeManager. What i mean
about node fails is the machine is down,all the service include nodemanager
process in this machine is down. So in this case, shuffle service is no
longer helpfull
--
Sent from: http://apache-spark-user-list.1001560.n3.nab
In your case you need to externalize the shuffle files to a component
outside of your spark cluster to make them persist after spark workers
death.
https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service
2017-12-20 10:46 GMT+01:00 chopinxb :
> In my use
In my use case, i run spark on yarn-client mode with dynamicAllocation
enabled, When a node shutting down abnormally, my spark application will
fails because of task fail to fetch shuffle blocks from that node 4 times.
Why spark do not leverage Alluxio(distributed in-memory filesystem) to write
sh
13 matches
Mail list logo