Thanks Artem, Looks interesting. I honestly didn't know what Hadoop Streaming API is used for. Here are more references: https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html
I think it brings to another question: how do we treat Python as a first class citizen. Especially for data science use cases, Python is *the* language. For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. But Hadoop does not ship a Python client. I see a number of Python libraries that support webhdfs. It's not clear to me how well they perform, and if they support more advanced features like encryption/Kerberos. NFS gateway is a possibility. Fuse-dfs is another option. But we know they don't work at scale, and the community seems to lost the steam to improve NFS/fuse-dfs. Thoughts? On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits <artemerv...@gmail.com> wrote: > > https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ > > On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert <mikeitexp...@gmail.com> > wrote: > >> Please let me know where I can find a good/simple example of mapreduce >> Python code running on Hadoop. Like tutorial or sth. >> >> Thank you >> >> >>