Thanks Artem,
Looks interesting. I honestly didn't know what Hadoop Streaming API is used
for.
Here are more references:
https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html

I think it brings to another question: how do we treat Python as a first
class citizen. Especially for data science use cases, Python is *the*
language.
For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS.
But Hadoop does not ship a Python client.
I see a number of Python libraries that support webhdfs. It's not clear to
me how well they perform, and if they support more advanced features like
encryption/Kerberos.

NFS gateway is a possibility. Fuse-dfs is another option. But we know they
don't work at scale, and the community seems to lost the steam to improve
NFS/fuse-dfs.

Thoughts?

On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits <artemerv...@gmail.com> wrote:

>
> https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
>
> On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert <mikeitexp...@gmail.com>
> wrote:
>
>> Please let me know where I can find a good/simple example of mapreduce
>> Python code running on Hadoop. Like tutorial or sth.
>>
>> Thank you
>>
>>
>>

Reply via email to