Hi Wei-Chiu,

You can look at Dask [1]. It can work with HDFS [2] and integrates well
with YARN as well [3].

1 - https://dask.org
2 - http://docs.dask.org/en/latest/remote-data-services.html
3 - http://yarn.dask.org/en/latest/

Thanks,
Hari


On Sun, 16 Jun 2019, 23:31 Wei-Chiu Chuang, <weic...@apache.org> wrote:

> Thanks Artem,
> Looks interesting. I honestly didn't know what Hadoop Streaming API is
> used for.
> Here are more references:
> https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html
>
> I think it brings to another question: how do we treat Python as a first
> class citizen. Especially for data science use cases, Python is *the*
> language.
> For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS.
> But Hadoop does not ship a Python client.
> I see a number of Python libraries that support webhdfs. It's not clear to
> me how well they perform, and if they support more advanced features like
> encryption/Kerberos.
>
> NFS gateway is a possibility. Fuse-dfs is another option. But we know they
> don't work at scale, and the community seems to lost the steam to improve
> NFS/fuse-dfs.
>
> Thoughts?
>
> On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits <artemerv...@gmail.com>
> wrote:
>
>>
>> https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
>>
>> On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert <mikeitexp...@gmail.com>
>> wrote:
>>
>>> Please let me know where I can find a good/simple example of mapreduce
>>> Python code running on Hadoop. Like tutorial or sth.
>>>
>>> Thank you
>>>
>>>
>>>

Reply via email to