Re: Spark execution on Hadoop cluster (many nodes)

Mich Talebzadeh Mon, 24 Jan 2022 09:57:43 -0800

Hadoop core comprises HDFS (the storage), MapReduce (parallel execution
algorithm)  and YARN (the resource manager).


Spark can use YARN. in either cluster or client mode and can use HDFS for
temporary or permanent storage. As HDFS is available and accessible in
all nodes, Spark can take advantage of that. Spark does MapReduce in memory
as opposed to disk to speed up queries by order of magnitude. Spark is just
an application on Hadoop and not much more.

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 24 Jan 2022 at 17:22, sam smith <qustacksm2123...@gmail.com> wrote:

> spark-submit a spark application on Hadoop (cluster mode) that's what i
> mean by  executing on Hadoop
>
> Le lun. 24 janv. 2022 à 18:00, Sean Owen <sro...@gmail.com> a écrit :
>
>> I am still not understanding what you mean by "executing on Hadoop".
>> Spark does not use Hadoop for execution. Probably can't answer until this
>> is cleared up.
>>
>> On Mon, Jan 24, 2022 at 10:57 AM sam smith <qustacksm2123...@gmail.com>
>> wrote:
>>
>>> I mean the DAG order is somehow altered when executing on Hadoop
>>>
>>> Le lun. 24 janv. 2022 à 17:17, Sean Owen <sro...@gmail.com> a écrit :
>>>
>>>> Code is not executed by Hadoop, nor passed through Hadoop somehow. Do
>>>> you mean data? data is read as-is. There is typically no guarantee about
>>>> ordering of data in files but you can order data. Still not sure what
>>>> specifically you are worried about here, but I don't think the kind of
>>>> thing you're contemplating can happen, no
>>>>
>>>> On Mon, Jan 24, 2022 at 9:28 AM sam smith <qustacksm2123...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am aware of that, but whenever the chunks of code are returned to
>>>>> Spark from Hadoop (after processing) could they be done not in the ordered
>>>>> way ? could this ever happen ?
>>>>>
>>>>> Le lun. 24 janv. 2022 à 16:14, Sean Owen <sro...@gmail.com> a écrit :
>>>>>
>>>>>> Hadoop does not run Spark programs, Spark does. How or why would
>>>>>> something, what, modify the byte code? No
>>>>>>
>>>>>> On Mon, Jan 24, 2022, 9:07 AM sam smith <qustacksm2123...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> My point is could Hadoop go wrong about one Spark execution ?
>>>>>>> meaning that it gets confused (given the concurrent distributed tasks) 
>>>>>>> and
>>>>>>> then adds wrong instruction to the program, or maybe does execute an
>>>>>>> instruction not at its right order (shuffling the order of execution by
>>>>>>> executing previous ones, while it shouldn't) ? Before finishing and
>>>>>>> returning the results from one node it returns the results of the other 
>>>>>>> in
>>>>>>> a wrong way for example.
>>>>>>>
>>>>>>> Le lun. 24 janv. 2022 à 15:31, Sean Owen <sro...@gmail.com> a
>>>>>>> écrit :
>>>>>>>
>>>>>>>> Not clear what you mean here. A Spark program is a program, so what
>>>>>>>> are the alternatives here? program execution order is still program
>>>>>>>> execution order. You are not guaranteed anything about order of 
>>>>>>>> concurrent
>>>>>>>> tasks. Failed tasks can be reexecuted so should be idempotent. I think 
>>>>>>>> the
>>>>>>>> answer is 'no' but not sure what you are thinking of here.
>>>>>>>>
>>>>>>>> On Mon, Jan 24, 2022 at 7:10 AM sam smith <
>>>>>>>> qustacksm2123...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello guys,
>>>>>>>>>
>>>>>>>>> I hope my question does not sound weird, but could a Spark
>>>>>>>>> execution on Hadoop cluster give different output than the program 
>>>>>>>>> actually
>>>>>>>>> does ? I mean by that, the execution order is messed by hadoop, or an
>>>>>>>>> instruction executed twice..; ?
>>>>>>>>>
>>>>>>>>> Thanks for your enlightenment
>>>>>>>>>
>>>>>>>>

Re: Spark execution on Hadoop cluster (many nodes)

Reply via email to