Re: Spark execution on Hadoop cluster (many nodes)

sam smith Mon, 24 Jan 2022 09:22:09 -0800

spark-submit a spark application on Hadoop (cluster mode) that's what i
mean by  executing on Hadoop


Le lun. 24 janv. 2022 à 18:00, Sean Owen <sro...@gmail.com> a écrit :

> I am still not understanding what you mean by "executing on Hadoop". Spark
> does not use Hadoop for execution. Probably can't answer until this is
> cleared up.
>
> On Mon, Jan 24, 2022 at 10:57 AM sam smith <qustacksm2123...@gmail.com>
> wrote:
>
>> I mean the DAG order is somehow altered when executing on Hadoop
>>
>> Le lun. 24 janv. 2022 à 17:17, Sean Owen <sro...@gmail.com> a écrit :
>>
>>> Code is not executed by Hadoop, nor passed through Hadoop somehow. Do
>>> you mean data? data is read as-is. There is typically no guarantee about
>>> ordering of data in files but you can order data. Still not sure what
>>> specifically you are worried about here, but I don't think the kind of
>>> thing you're contemplating can happen, no
>>>
>>> On Mon, Jan 24, 2022 at 9:28 AM sam smith <qustacksm2123...@gmail.com>
>>> wrote:
>>>
>>>> I am aware of that, but whenever the chunks of code are returned to
>>>> Spark from Hadoop (after processing) could they be done not in the ordered
>>>> way ? could this ever happen ?
>>>>
>>>> Le lun. 24 janv. 2022 à 16:14, Sean Owen <sro...@gmail.com> a écrit :
>>>>
>>>>> Hadoop does not run Spark programs, Spark does. How or why would
>>>>> something, what, modify the byte code? No
>>>>>
>>>>> On Mon, Jan 24, 2022, 9:07 AM sam smith <qustacksm2123...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> My point is could Hadoop go wrong about one Spark execution ? meaning
>>>>>> that it gets confused (given the concurrent distributed tasks) and then
>>>>>> adds wrong instruction to the program, or maybe does execute an 
>>>>>> instruction
>>>>>> not at its right order (shuffling the order of execution by executing
>>>>>> previous ones, while it shouldn't) ? Before finishing and returning the
>>>>>> results from one node it returns the results of the other in a wrong way
>>>>>> for example.
>>>>>>
>>>>>> Le lun. 24 janv. 2022 à 15:31, Sean Owen <sro...@gmail.com> a écrit :
>>>>>>
>>>>>>> Not clear what you mean here. A Spark program is a program, so what
>>>>>>> are the alternatives here? program execution order is still program
>>>>>>> execution order. You are not guaranteed anything about order of 
>>>>>>> concurrent
>>>>>>> tasks. Failed tasks can be reexecuted so should be idempotent. I think 
>>>>>>> the
>>>>>>> answer is 'no' but not sure what you are thinking of here.
>>>>>>>
>>>>>>> On Mon, Jan 24, 2022 at 7:10 AM sam smith <
>>>>>>> qustacksm2123...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello guys,
>>>>>>>>
>>>>>>>> I hope my question does not sound weird, but could a Spark
>>>>>>>> execution on Hadoop cluster give different output than the program 
>>>>>>>> actually
>>>>>>>> does ? I mean by that, the execution order is messed by hadoop, or an
>>>>>>>> instruction executed twice..; ?
>>>>>>>>
>>>>>>>> Thanks for your enlightenment
>>>>>>>>
>>>>>>>

Re: Spark execution on Hadoop cluster (many nodes)

Reply via email to