Hi Dave, Let me answer your questions: 1. The RocksDB state backend always stores the data on local disks for speed. The back up is done to HDFS or any other distributed file system. The local data directory is configured automatically by YARN.
2. You need to manually configure zookeeper in the flink conf even on YARN for HA. See here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#example-highly-available-yarn-session 3. Okay, I hope I'm not adding a third opinion here. For the HADOOP_CLASSPATH classpath, I would not recommend to set it, unless something is not working. Does your job has any external hadoop dependencies that need to be available in your job? If not, just go with the standard settings. 4. Again, as long as everything works smoothly, you can stick to the bundled vanilla Hadoop versions. The problem is that the Hadoop vendors (In this case Hortonworks) are using patched hadoop releases with changes. This can lead to incompatibilites between client (flink) and server (hadoop). But the situation has gotten much better in the last year. So as long as everything works, you don't need to worry. Let me know if there's more you want to know about putting Flink into production. Regards, Robert On Fri, Mar 10, 2017 at 3:38 PM, Torok, David <david_to...@comcast.com> wrote: > Hi, > > > > Forgive me if parts of this question have been answered before but I’d > like help in resolving some bits of confusion from the documentation and > the fact that I haven’t been able to find a good example anywhere for an > enterprise-style setup. If anyone has a sample HA / Yarn / ZK / RocksDB > configuration could you share? > > > > We are currently using Flink 1.2.0 and Hortonworks (an older version, > 2.2.9 based on Hadoop 2.6.0). We’re trying a small sample cluster with 9 > Yarn client nodes. > > > > 1. We have large state and large time-windows and therefore want to > use RocksDB as our state backend. Is it a typical or best practice that > RocksDB store to local-disk storage for speed, and the checkpoints store to > HDFS for recovery / HA? Or is everything in HDFS? So from my > understanding from the docs, “The RocksDBStateBackend holds in-flight > data in a RocksDB <http://rocksdb.org/> data base that is (per default) > stored in the TaskManager data directories”… (is this set automatic via > YARN?)… and the checkpoint directory is via “state.backend.fs.checkpointdir: > hdfs://namenode:40010/flink/checkpoints” or dynamically e.g. new > RocksDBStateBackend(statepath). > > 2. It’s unclear to me whether Yarn automatically provides Flink > with the Zookeeper information, or whether I also need to set the zookeeper > info in flink-conf.yaml… the examples seem to imply that the ZK information > might only be used if you start your own Zookeeper rather than it already > existing. Do I need to set it up for HA via YARN? > > 3. I’ve seen some conflicting information about including > HADOOP_CLASSPATH – some say there are many conflicts with Flink libraries > whereas others say it’s important to resolve various deserialization errors > during runtime. > > 4. Someone suggested that we build Flink from source ourselves > against the Hortonworks distribution; I’m really hoping that’s not > necessary. > > > > Appreciate any info as we learn how to productionize our Flink clusters! > > > > Best Regards > > Dave >