Hadoop core comprises HDFS (the storage), MapReduce (parallel execution
algorithm) and YARN (the resource manager).
Spark can use YARN. in either cluster or client mode and can use HDFS for
temporary or permanent storage. As HDFS is available and accessible in
all nodes, Spark can take advantage
spark-submit a spark application on Hadoop (cluster mode) that's what i
mean by executing on Hadoop
Le lun. 24 janv. 2022 à 18:00, Sean Owen a écrit :
> I am still not understanding what you mean by "executing on Hadoop". Spark
> does not use Hadoop for execution. Probably can't answer until
I mean the DAG order is somehow altered when executing on Hadoop
Le lun. 24 janv. 2022 à 17:17, Sean Owen a écrit :
> Code is not executed by Hadoop, nor passed through Hadoop somehow. Do you
> mean data? data is read as-is. There is typically no guarantee about
> ordering of data in files but
Code is not executed by Hadoop, nor passed through Hadoop somehow. Do you
mean data? data is read as-is. There is typically no guarantee about
ordering of data in files but you can order data. Still not sure what
specifically you are worried about here, but I don't think the kind of
thing you're
I am aware of that, but whenever the chunks of code are returned to Spark
from Hadoop (after processing) could they be done not in the ordered way ?
could this ever happen ?
Le lun. 24 janv. 2022 à 16:14, Sean Owen a écrit :
> Hadoop does not run Spark programs, Spark does. How or why would
>
Hadoop does not run Spark programs, Spark does. How or why would something,
what, modify the byte code? No
On Mon, Jan 24, 2022, 9:07 AM sam smith wrote:
> My point is could Hadoop go wrong about one Spark execution ? meaning that
> it gets confused (given the concurrent distributed tasks) and
My point is could Hadoop go wrong about one Spark execution ? meaning that
it gets confused (given the concurrent distributed tasks) and then adds
wrong instruction to the program, or maybe does execute an instruction not
at its right order (shuffling the order of execution by executing previous
Not clear what you mean here. A Spark program is a program, so what are the
alternatives here? program execution order is still program execution
order. You are not guaranteed anything about order of concurrent tasks.
Failed tasks can be reexecuted so should be idempotent. I think the answer
is
Hello guys,
I hope my question does not sound weird, but could a Spark execution on
Hadoop cluster give different output than the program actually does ? I
mean by that, the execution order is messed by hadoop, or an instruction
executed twice..; ?
Thanks for your enlightenment