Hi, All. I am new to Flink. I just installed Flink in clusters and start reading documents to understand Flink internals. After reading some documents, I have some questions. I have some experiences of Storm and Heron before, so I am linking their mechanisms to questions to better understand Flink.
1. Can I specify worker parallelism explicitly like Storm? 2. Record in Flink Can I think a "record" in FLINK is almost same as Tuple in Storm? Tuple in Storm is used for carrying "real data" + "metadata (e.g., stream type, source id and so on). 3. How does partition (e.g., shuffling, map) works internally? In Storm, it has (worker id) : (tcp info to next workers) tables. So, based on this information, after executing partition function, Tuple is forwarded to next hops based on tables. Is it the same? 4. How does Flink detect fault in case of worker dead machine failure? Based on documents, Job manager checks liveness of task managers with heartbeat message. In Storm, supervisor (I think it is similar with Task manager) first detects worker dead based on heartbeat and locally re-runs it again. For machine failure, Nimbus (I think it is similar with Job manager) detects machine failure based on supervisor's heartbeat and re-schedule all assigned worker to other machine. How does Flink work? 5. For exactly-once delivery, Flink uses checking point and record replay mechanism. It needs messages queues (e.g, Kafka) for record replay. Kafka uses TCP to send and receive data. So I wonder if data source does not use TCP (e.g., IoT sensors), what is general solutions to use record replay? For example, source workers are directly connected to several inputs (e.g., IoT sensors) while I think it is not normal deployments. 6. Flink supports Cycles. However, based on documents, Cycled tasks act as regular dataflow source and sink respectively, yet they are collocated in the same physical instance to share an in-memory buffer and thus, implement loopback stream transparently. So, what if the number of workers which make cycles is high? It would be hard to put them in the same physical machine. Thanks, Junguk