Thanks @Xiaoyu Yao <x...@cloudera.com> for giving us a great status update on Ozone!
We had a pretty large group yesterday. Here's my notes for your reference: <goog_1177019630> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing 11/6/2019 ~20 contributors joined the discussion.Weichiu, Xiaoyu, Chen, Haihua, haiyang, hexiaoqiao, Hui, Jinglun, Li, Lisheng, Oliver, sibyl.lv, Sammi, Yisheng, aiphago, Dazhuang, haicai and many others. Xiaoyu led the discussion of Ozone: object store for big data workloads.What and why, feature set, current development: 0.4 features (security) and 0.5 features (HA), future roadmap: scale and stability improvement. Decommissioning support in progress Questions: 1. Python client implementation — S3 or RPC 1. Sammi: Tencent is preparing to introduce Ozone at Tencent. Use case 1: Hive. Use case 2: Data science use cases, small files. Requires Python client. 2. Ozone GA timeline 3. How does client read: is OM involved in reading data? Ans: No. client access DataNode directly. 4. What metadata does OM and SCM maintain? 5. When can Ozone be used in production environment? Ans: wait for GA, and benchmarks running workloads like TPC-DS. 6. Performance comparison between HDFS and Ozone. Ans: Ozone use RocksDB as the persistent store for metadata, and optimization and tuning is required for RocksDB. 7. Ozone uses Raft replication protocol. What if it replicates more than 3 copies? Would the leader become the bottleneck? Ans: multi Raft project is undergoing which addresses this problem. 8. Rename? Ozone is flat hierarchy. Does it mean rename is a O(n) operation? Ans: Ozone plans to support hierarchy.