Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Mich Talebzadeh
Hadoop core comprises HDFS (the storage), MapReduce (parallel execution algorithm) and YARN (the resource manager). Spark can use YARN. in either cluster or client mode and can use HDFS for temporary or permanent storage. As HDFS is available and accessible in all nodes, Spark can take advantage

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
spark-submit a spark application on Hadoop (cluster mode) that's what i mean by executing on Hadoop Le lun. 24 janv. 2022 à 18:00, Sean Owen a écrit : > I am still not understanding what you mean by "executing on Hadoop". Spark > does not use Hadoop for execution. Probably can't answer until

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
I mean the DAG order is somehow altered when executing on Hadoop Le lun. 24 janv. 2022 à 17:17, Sean Owen a écrit : > Code is not executed by Hadoop, nor passed through Hadoop somehow. Do you > mean data? data is read as-is. There is typically no guarantee about > ordering of data in files but

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Sean Owen
Code is not executed by Hadoop, nor passed through Hadoop somehow. Do you mean data? data is read as-is. There is typically no guarantee about ordering of data in files but you can order data. Still not sure what specifically you are worried about here, but I don't think the kind of thing you're

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
I am aware of that, but whenever the chunks of code are returned to Spark from Hadoop (after processing) could they be done not in the ordered way ? could this ever happen ? Le lun. 24 janv. 2022 à 16:14, Sean Owen a écrit : > Hadoop does not run Spark programs, Spark does. How or why would >

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Sean Owen
Hadoop does not run Spark programs, Spark does. How or why would something, what, modify the byte code? No On Mon, Jan 24, 2022, 9:07 AM sam smith wrote: > My point is could Hadoop go wrong about one Spark execution ? meaning that > it gets confused (given the concurrent distributed tasks) and

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
My point is could Hadoop go wrong about one Spark execution ? meaning that it gets confused (given the concurrent distributed tasks) and then adds wrong instruction to the program, or maybe does execute an instruction not at its right order (shuffling the order of execution by executing previous

Re: Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread Sean Owen
Not clear what you mean here. A Spark program is a program, so what are the alternatives here? program execution order is still program execution order. You are not guaranteed anything about order of concurrent tasks. Failed tasks can be reexecuted so should be idempotent. I think the answer is

Spark execution on Hadoop cluster (many nodes)

2022-01-24 Thread sam smith
Hello guys, I hope my question does not sound weird, but could a Spark execution on Hadoop cluster give different output than the program actually does ? I mean by that, the execution order is messed by hadoop, or an instruction executed twice..; ? Thanks for your enlightenment