Re: Flink AutoScaling EMR

2020-11-11 Thread Robert Metzger
Hey Rex, the second approach (spinning up a standby job and then doing a handover) sounds more promising to implement, without rewriting half of the Flink codebase ;) What you need is a tool that orchestrates creating a savepoint, starting a second job from the savepoint and then communicating wit

Re: batch模式broadcast hash join为什么会有数据丢失

2020-11-11 Thread Robert Metzger
Thanks a lot for posting a question to the user@ mailing list. Note that the language of this list is English. For Chinese language support, reach out to user...@flink.apache.org. On Thu, Nov 12, 2020 at 5:53 AM 键 <1941890...@qq.com> wrote: > batch模式broadcast hash join为什么会有数据丢失 >

batch模式broadcast hash join为什么会有数据丢失

2020-11-11 Thread
batch模式broadcast hash join为什么会有数据丢失

Re: Flink 1.11 not showing logs

2020-11-11 Thread Yang Wang
If you have set the environment FLINK_CONF_DIR, then it will have a higher priority. I think that could be why you changed the log4j.properties in the conf directory but it does not take effect. Yes, if you have changed the log4j.properties, you need to relaunch the Flink application. Although we

Re: FlinkSQL kafka->dedup->kafka

2020-11-11 Thread Jark Wu
Hi Laurent, 1. Deduplicate with keeping the first row will generate an append-only stream. But I guess you are expecting to keep the last row which generates an updating stream. An alternative way is you can use the "changelog-json" format in this repo [1], it will convert the updating stream int

Re: Flink 1.11 not showing logs

2020-11-11 Thread Diwakar Jha
HI Yang, I'm able to see taskmanage and jobmanager logs after I changed the log4j.properties file (/usr/lib/flink/conf). Thank you! I updated the file as shown below. I had to kill the app ( yarn application -kill ) and start flink job again to get the logs. This doesn't seem like an efficient w

Re: Flink AutoScaling EMR

2020-11-11 Thread Rex Fenley
Another thought, would it be possible to * Spin up new core or task nodes. * Run a new copy of the same job on these new nodes from a savepoint. * Have the new job *not* write to the sink until the other job is torn down? This would allow us to be eventually consistent and maintain writes going th

Flink AutoScaling EMR

2020-11-11 Thread Rex Fenley
Hello, I'm trying to find a solution for auto scaling our Flink EMR cluster with 0 downtime using RocksDB as state storage and S3 backing store. My current thoughts are like so: * Scaling an Operator dynamically would require all keyed state to be available to the set of subtasks for that operato

Re: BoundedOutOfOrderness Watermark Generator is NOT making the event time to advance

2020-11-11 Thread fuyao . li
Hi Community, Regarding this problem, could someone give me an explanation? Thanks. Best, Fuyao On 11/10/20 16:56, fuyao...@oracle.com wrote: Hi Kevin, Sorry for the name typo... On 11/10/20 16:48, fuyao...@oracle.com wrote: Hi Kavin, Thanks for your example. I think I have already don

Re: SSL setup for YARN deployment when hostnames are unknown.

2020-11-11 Thread Jiahui Jiang
Since the issue is right now we can't dynamically generate a keystore when the YARN application launches, but before the JobManager process starts. Do you think the best short term solution we will hack around `yarn.container-start-command-template`and have it execute a custom script that can g

Re: Flink 1.8.3 GC issues

2020-11-11 Thread Aljoscha Krettek
Hi, nice work on debugging this! We need the synchronized block in the source because the call to reader.advance() (via the invoker) and reader.getCurrent() (via emitElement()) must be atomic with respect to state. We cannot advance the reader state, not emit that record but still checkpoint

Re: Native kubernetes setup

2020-11-11 Thread Boris Lublinsky
Guys, I just created a simple PR https://github.com/apache/flink/pull/14005 allowing me to mount different K8 resources - PVCs, Secrets, configmaps > On Nov 6, 2020, at 6:37 AM, Yang Wang wrote: > > Actually, in our document, we have provided a com

Re: error in using package SubnetUtils

2020-11-11 Thread Diwakar Jha
Thank you, Arvid. i changed the import to com.facebook.presto.hadoop.$internal.org.apache.commons.net.util.SubnetUtils and it worked. Also, I will add apache-commons to my project as you suggested. Thanks. On Wed, Nov 11, 2020 at 4:46 AM Arvid Heise wrote: > Hi Diwakar, > > we removed shading f

Re: SSL setup for YARN deployment when hostnames are unknown.

2020-11-11 Thread Jiahui Jiang
Hello Matthias, Thank you for the links! I did see the documentations and went through the sourcecode. But unfortunately it looks like only a prebuilt keystore can be supported for YARN right now. In term of dynamic loading security modules, the link you sent seems to mainly for zookeeper's se

Re: PyFlink Table API and UDF Limitations

2020-11-11 Thread Dian Fu
Hi Niklas, You are correct that the input/output length of Pandas UDF must be of the same size and that Flink will split the input data into multiple bundles for Pandas UDF and the bundle size is non-determinstic. Both of the above two limitations are by design and so I guess Pandas UDF could n

[ANNOUNCE] Weekly Community Update 2020/44-45

2020-11-11 Thread Konstantin Knauf
Dear community, two weeks have passed again and I am happy two share another update with news on Flink 1.12, Flink 1.11.3 and the release of Stateful Functions 2.2.1. As everyone has been finishing the last bit and pieces of Flink 1.12, there are only a handful of new initiatives to cover this tim

Re: FlinkSQL kafka->dedup->kafka

2020-11-11 Thread Laurent Exsteens
Hi Jark, thanks for your quick reply. I was indeed expecting it. But that triggers the following questions: 1. Is there another way to do this deduplication and generate an append-only stream? Match Recognize? UDF? ...? 2. If I would put Postgres as a sink, what would happen? Will the e

PyFlink Table API and UDF Limitations

2020-11-11 Thread Niklas Wilcke
Hi Flink Community, I'm currently trying to implement a parallel machine learning job with Flink. The goal is to train models in parallel for independent time series in the same data stream. For that purpose I'm using a Python library, which lead me to PyFlink. Let me explain the use case a bit

Re: error in using package SubnetUtils

2020-11-11 Thread Arvid Heise
Hi Diwakar, we removed shading from s3 plugins in Flink 1.11. So the package should be com.facebook.presto.hadoop.$internal.org.apache.commons.net.util.SubnetUtils now. But I strongly discourage you from using internally shaded libs. Rather use add apache-commons to your project as a proper depend

Re: Caching Mechanism in Flink

2020-11-11 Thread Jack Kolokasis
Hi Matthias, Yeap, I am refer to the tasks' off-heap configuration value. Best, Iacovos On 11/11/20 1:37 μ.μ., Matthias Pohl wrote: When talking about the "off-heap" in your most recent message, are you still referring to the task's off-heap configuration value? AFAIK, the HybridMemorySegment

Re: Caching Mechanism in Flink

2020-11-11 Thread Matthias Pohl
When talking about the "off-heap" in your most recent message, are you still referring to the task's off-heap configuration value? AFAIK, the HybridMemorySegment shouldn't be directly related to the off-heap parameter. The HybridMemorySegment can be used as a wrapper around any kind of memory, i.e

Re: checkpoint interval and hdfs file capacity

2020-11-11 Thread Congxian Qiu
Hi Currently, checkpoint discard logic was executed in Executor[1], maybe it will not be deleted so quickly [1] https://github.com/apache/flink/blob/91404f435f20c5cd6714ee18bf4ccf95c81fb73e/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointsCleaner.java#L45 Best, Congx

why not flink delete the checkpoint directory recursively?

2020-11-11 Thread Joshua Fan
Hi When a checkpoint should be deleted, FsCompletedCheckpointStorageLocation. disposeStorageLocation will be called. Inside it, fs.delete(exclusiveCheckpointDir, false) will do the delete action. I wonder why the recursive parameter is set to false? as the exclusiveCheckpointDir is truly a directo

Re: Caching Mechanism in Flink

2020-11-11 Thread Jack Kolokasis
Hi Matthias, Thank you for your reply and useful information. I find that the off-heap is used when Flink uses HybridMemorySegments. Well, how the Flink knows when to use these HybridMemorySegments and in which operations this is happened? Best, Iacovos On 11/11/20 11:41 π.μ., Matthias Pohl

Re: debug statefun

2020-11-11 Thread Igal Shilman
Glad to hear that it worked out! On Wed, Nov 11, 2020 at 9:07 AM Lian Jiang wrote: > Just realized making autoservice class discoverable also solved "There are > no routers defined" mentioned by Puneet. Yes, harness does test statefun > module discovery. Thanks. > > On Tue, Nov 10, 2020 at 9:57

Re: SSL setup for YARN deployment when hostnames are unknown.

2020-11-11 Thread Matthias Pohl
Hi Jiahui, thanks for reaching out to the mailing list. This is not something I have expertise in. But have you checked out the Flink SSL Setup documentation [1]? Maybe, you'd find some help there. Additionally, I did go through the code a bit: A SecurityContext is loaded during ClusterEntrypoint

Re: Caching Mechanism in Flink

2020-11-11 Thread Matthias Pohl
Hi Iacovos, The task's off-heap configuration value is used when spinning up TaskManager containers in a clustered environment. It will contribute to the overall memory reserved for a TaskManager container during deployment. This parameter can be used to influence the amount of memory allocated if

Re: Flink-Kafka exactly once errors even with transaction.timeout.ms set

2020-11-11 Thread Aljoscha Krettek
Hmm, could you please post the full stack trace that leads to the TimeoutException? Best, Aljoscha On 10.11.20 17:54, Tim Josefsson wrote: Hey Aljoscha, I'm setting the transaction.timeout.ms when I create the FlinkKafkaProducer: I create a Properties object and then set the property and fin

[ANNOUNCE] Apache Flink Stateful Functions 2.2.1 released

2020-11-11 Thread Tzu-Li (Gordon) Tai
The Apache Flink community released the first bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.1. This release fixes a critical bug that causes restoring a Stateful Functions cluster from snapshots (checkpoints or savepoints) to fail under certain conditions. *We strong

Re: debug statefun

2020-11-11 Thread Lian Jiang
Just realized making autoservice class discoverable also solved "There are no routers defined" mentioned by Puneet. Yes, harness does test statefun module discovery. Thanks. On Tue, Nov 10, 2020 at 9:57 PM Tzu-Li (Gordon) Tai wrote: > On Wed, Nov 11, 2020 at 1:44 PM Tzu-Li (Gordon) Tai > wrote: