spark-submit a spark application on Hadoop (cluster mode) that's what i mean by executing on Hadoop
Le lun. 24 janv. 2022 à 18:00, Sean Owen <sro...@gmail.com> a écrit : > I am still not understanding what you mean by "executing on Hadoop". Spark > does not use Hadoop for execution. Probably can't answer until this is > cleared up. > > On Mon, Jan 24, 2022 at 10:57 AM sam smith <qustacksm2123...@gmail.com> > wrote: > >> I mean the DAG order is somehow altered when executing on Hadoop >> >> Le lun. 24 janv. 2022 à 17:17, Sean Owen <sro...@gmail.com> a écrit : >> >>> Code is not executed by Hadoop, nor passed through Hadoop somehow. Do >>> you mean data? data is read as-is. There is typically no guarantee about >>> ordering of data in files but you can order data. Still not sure what >>> specifically you are worried about here, but I don't think the kind of >>> thing you're contemplating can happen, no >>> >>> On Mon, Jan 24, 2022 at 9:28 AM sam smith <qustacksm2123...@gmail.com> >>> wrote: >>> >>>> I am aware of that, but whenever the chunks of code are returned to >>>> Spark from Hadoop (after processing) could they be done not in the ordered >>>> way ? could this ever happen ? >>>> >>>> Le lun. 24 janv. 2022 à 16:14, Sean Owen <sro...@gmail.com> a écrit : >>>> >>>>> Hadoop does not run Spark programs, Spark does. How or why would >>>>> something, what, modify the byte code? No >>>>> >>>>> On Mon, Jan 24, 2022, 9:07 AM sam smith <qustacksm2123...@gmail.com> >>>>> wrote: >>>>> >>>>>> My point is could Hadoop go wrong about one Spark execution ? meaning >>>>>> that it gets confused (given the concurrent distributed tasks) and then >>>>>> adds wrong instruction to the program, or maybe does execute an >>>>>> instruction >>>>>> not at its right order (shuffling the order of execution by executing >>>>>> previous ones, while it shouldn't) ? Before finishing and returning the >>>>>> results from one node it returns the results of the other in a wrong way >>>>>> for example. >>>>>> >>>>>> Le lun. 24 janv. 2022 à 15:31, Sean Owen <sro...@gmail.com> a écrit : >>>>>> >>>>>>> Not clear what you mean here. A Spark program is a program, so what >>>>>>> are the alternatives here? program execution order is still program >>>>>>> execution order. You are not guaranteed anything about order of >>>>>>> concurrent >>>>>>> tasks. Failed tasks can be reexecuted so should be idempotent. I think >>>>>>> the >>>>>>> answer is 'no' but not sure what you are thinking of here. >>>>>>> >>>>>>> On Mon, Jan 24, 2022 at 7:10 AM sam smith < >>>>>>> qustacksm2123...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello guys, >>>>>>>> >>>>>>>> I hope my question does not sound weird, but could a Spark >>>>>>>> execution on Hadoop cluster give different output than the program >>>>>>>> actually >>>>>>>> does ? I mean by that, the execution order is messed by hadoop, or an >>>>>>>> instruction executed twice..; ? >>>>>>>> >>>>>>>> Thanks for your enlightenment >>>>>>>> >>>>>>>