Re: Spark execution on Hadoop cluster (many nodes)

sam smith Mon, 24 Jan 2022 08:57:27 -0800

I mean the DAG order is somehow altered when executing on Hadoop

Le lun. 24 janv. 2022 à 17:17, Sean Owen <sro...@gmail.com> a écrit :


> Code is not executed by Hadoop, nor passed through Hadoop somehow. Do you
> mean data? data is read as-is. There is typically no guarantee about
> ordering of data in files but you can order data. Still not sure what
> specifically you are worried about here, but I don't think the kind of
> thing you're contemplating can happen, no
>
> On Mon, Jan 24, 2022 at 9:28 AM sam smith <qustacksm2123...@gmail.com>
> wrote:
>
>> I am aware of that, but whenever the chunks of code are returned to Spark
>> from Hadoop (after processing) could they be done not in the ordered way ?
>> could this ever happen ?
>>
>> Le lun. 24 janv. 2022 à 16:14, Sean Owen <sro...@gmail.com> a écrit :
>>
>>> Hadoop does not run Spark programs, Spark does. How or why would
>>> something, what, modify the byte code? No
>>>
>>> On Mon, Jan 24, 2022, 9:07 AM sam smith <qustacksm2123...@gmail.com>
>>> wrote:
>>>
>>>> My point is could Hadoop go wrong about one Spark execution ? meaning
>>>> that it gets confused (given the concurrent distributed tasks) and then
>>>> adds wrong instruction to the program, or maybe does execute an instruction
>>>> not at its right order (shuffling the order of execution by executing
>>>> previous ones, while it shouldn't) ? Before finishing and returning the
>>>> results from one node it returns the results of the other in a wrong way
>>>> for example.
>>>>
>>>> Le lun. 24 janv. 2022 à 15:31, Sean Owen <sro...@gmail.com> a écrit :
>>>>
>>>>> Not clear what you mean here. A Spark program is a program, so what
>>>>> are the alternatives here? program execution order is still program
>>>>> execution order. You are not guaranteed anything about order of concurrent
>>>>> tasks. Failed tasks can be reexecuted so should be idempotent. I think the
>>>>> answer is 'no' but not sure what you are thinking of here.
>>>>>
>>>>> On Mon, Jan 24, 2022 at 7:10 AM sam smith <qustacksm2123...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello guys,
>>>>>>
>>>>>> I hope my question does not sound weird, but could a Spark execution
>>>>>> on Hadoop cluster give different output than the program actually does ? 
>>>>>> I
>>>>>> mean by that, the execution order is messed by hadoop, or an instruction
>>>>>> executed twice..; ?
>>>>>>
>>>>>> Thanks for your enlightenment
>>>>>>
>>>>>

Re: Spark execution on Hadoop cluster (many nodes)

Reply via email to