Re: End-to-end lag spikes when closing a large number of panes

2024-03-27 Thread Robert Metzger
Hey Caio, Your analysis of the problem sounds right to me, I don't have a good solution for you :( I’ve validated that CPU profiles show clearAllState using a significant > amount of CPU. Did you use something like async-profiler here? Do you have more info on the breakdown into what used the C

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-12 Thread Robert Metzger
hives.org) and we could generally >> > >>>> customize the experience more towards Apache Flink. If we go for >> > Slack, >> > >>>> let's definitely try to archive it like Airflow did. If we do >> this, we >> > >>>> do

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
e once the invite link expires. It's not a nice solution, but it'll work. (1) https://the-asf.slack.com/archives/CBX4TSBQ8/p1652125017094159 On Mon, May 9, 2022 at 3:59 PM Robert Metzger wrote: > Thanks a lot for your answer. The onboarding experience to the ASF Slack > is

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
p for the ASF instance of Slack, you can > only get there if you're a committer or if you're invited by a committer. > > On Mon, 9 May 2022 at 15:15, Robert Metzger wrote: > > > Sorry for joining this discussion late, and thanks for the summary > Xintong! > > &g

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
Sorry for joining this discussion late, and thanks for the summary Xintong! Why are we considering a separate slack instance instead of using the ASF Slack instance? The ASF instance is paid, so all messages are retained forever, and quite a few people are already on that Slack instance. There is

Re: Flink serialization errors at a batch job

2022-05-09 Thread Robert Metzger
Hi, I suspect that this error is not caused by Flink code (because our serializer stack is fairly stable, there would be more users reporting such issues if it was a bug in Flink). In my experience, these issues are caused by broken serializer implementations (e.g. a serializer being used by multi

Re: Practical guidance with Scala and Flink >= 1.15

2022-05-09 Thread Robert Metzger
Hi Salva, my somewhat wild guess (because I'm not very involved with the Scala development on Flink): I would stick with option 1 for now. It should be easier now for the Flink community to support Scala versions past 2.12 (because we don't need to worry about scala 2.12+ support for Flink's intern

Re: WatermarkStrategy for IngestionTime

2022-04-05 Thread Robert Metzger
Hi, IngestionTime is usually used when the records don't have a proper event time associated with it, but the job has a long topology, and users want to persist the (time)order of events as they arrive in the system. In that sense, you can use the regular event time watermark strategies also for i

Re: Flink metric

2022-04-05 Thread Robert Metzger
Hi, multiple records are in the system at the same time, because Flink is buffering records in various components, for efficiency reasons. That's why you see that an individual record might have a latency of ~100ms, while Flink is processing many more messages. On Tue, Apr 5, 2022 at 12:54 PM lo

Re: BigQuery connector debugging

2022-04-05 Thread Robert Metzger
Hi Matt, At first glance your code looks fine. I guess you'll need to follow the codepaths more with the debugger. Have you made sure that "reachedEnd()" returns false? On Tue, Apr 5, 2022 at 9:42 AM Matthew Brown wrote: > Hi all, > > I'm attempting to build a Table API connector for BigQuery

Re: Example for Jackson JsonNode Kafka serialization schema

2022-01-28 Thread Robert Metzger
Hi Oran, as you've already suggested, you could just use a (flat)map function that takes an ObjectNode and outputs a string. In the mapper, you can do whatever you want in case of an invalid object: logging about it, discarding it, writing an "error json string", writing to a side output stream, .

Re: Upgrade to 1.14.3

2022-01-28 Thread Robert Metzger
Hi Sweta, yes, you can not run a Flink job compiled against Flink 1.13. against a 1.14 cluster. But if you are only using stable APIs of Flink, you should be able to compile your job with the 1.14 dependencies without touching the code. See also: https://nightlies.apache.org/flink/flink-docs-rele

Re: IllegalArgumentException: URI is not hierarchical error when initializating jobmanager in cluster

2022-01-28 Thread Robert Metzger
Hi Javier, I suspect that TwitterServer is using some classloading / dependency injection / service loading "magic" that is causing this. I would try to find out, either by attaching a remote debugger (should be possible when executing in cluster mode locally) or by adding log statements in the co

Re: Duplicate job submission error

2022-01-28 Thread Robert Metzger
Hi Parag, it seems that you are submitting a job with the same job id multiple times. An easy fix would be generating a new job id each time you are submitting the job. To debug this: check out the Flink jobmanager logs, there are log messages for every job submission. On Thu, Jan 27, 2022 at 9

Re: Inaccurate checkpoint trigger time

2022-01-28 Thread Robert Metzger
Hi Paul, where are you storing your checkpoints, and what's their size? IIRC, Flink won't trigger a new checkpoint before the old ones haven't been cleaned up, and if your checkpoints are large and stored on S3, it can take a while to clean them up (especially with the Hadoop S3 plugin, using pre

Re: Determinism of interval joins

2022-01-28 Thread Robert Metzger
Instead of using `reinterpretAsKeyedStream` can you use keyBey and see if the behavior gets deterministic? On Thu, Jan 27, 2022 at 9:49 PM Alexis Sarda-Espinosa < alexis.sarda-espin...@microfocus.com> wrote: > I'm not sure if the issue in [1] is relevant since it mentions the Table > API, but it

Re: Flink native k8s integration vs. operator

2022-01-20 Thread Robert Metzger
Hi Alexis, The usage of Custom Resource Definitions (CRDs). The main reason given to > me was that such resources are global (for a given cluster) and that is not > desired. I know that ultimately a CR based on a CRD can be scoped to a > specific namespace, but customer is king… I don't think th

Re: Flink (DataStream) in Kubernetes

2022-01-18 Thread Robert Metzger
Hi Jessy, Which approach is suitable for a standalone deployment in Kubernetes? Do we > have some best practises for running Flink applications on K8s ? I would deploy Flink in Application Mode using the standalone K8s deployment: https://nightlies.apache.org/flink/flink-docs-master/docs/deploym

Re: Alternatives of KafkaDeserializationSchema.isEndOfStream()

2021-12-07 Thread Robert Metzger
Hi Ayush, I couldn't find the documentation you've mentioned. Can you send me a link to it? On Tue, Dec 7, 2021 at 10:40 AM Ayush Chauhan wrote: > Hi, > > Can you please let me know the alternatives of isEndOfStream() as now > according to docs this method will no longer be used to determine th

Re: Customize Kafka client (module.yaml)

2021-12-07 Thread Robert Metzger
Hi Jérémy, In my understanding of the StateFun docs, you need to pass custom properties using "ingress.spec.properties". For example: ingresses: - ingress: meta: type: io.statefun.kafka/ingress id: project.A/input spec: properties: max.request.size

Re: Which issue or commit should i merge in flink-1.13.3 for buffer debloating?

2021-12-07 Thread Robert Metzger
Hi, I guess all the commits mentioned in all the subtasks of this ticket will give you the feature: https://issues.apache.org/jira/browse/FLINK-23451 Hower, I'm pretty sure that you can't just cherry-pick such a big feature to an older Flink version. I would rather try to fix the connector to up

Re: Weird operator ID check restore from checkpoint fails

2021-12-07 Thread Robert Metzger
ful checkpoint. > > I'm confused why `811d3b279c8b26ed99ff0883b7630242` is used and not the > auto-generated uid. That seems like a bug. > > On Tue, Dec 7, 2021 at 10:40 PM Robert Metzger > wrote: > >> Hi Dan, >> >> When restoring a savepoint/checkpoin

Re: Job Listener not working as expected

2021-12-07 Thread Robert Metzger
Hi Puneet, Are you submitting the Flink jobs using the "/bin/flink" command line tool to a cluster in session mode? Maybe the command line tool is just "fire and forget" submitting the job to the cluster, that's why the listeners are firing immediately. Can you try to use "env.executeAsync()" inst

Re: Weird operator ID check restore from checkpoint fails

2021-12-07 Thread Robert Metzger
Hi Dan, When restoring a savepoint/checkpoint, Flink is matching the state for the operators based on the uuid of the operator. The exception says that there is some state that doesn't match any operator. So from Flink's perspective, the operator is gone. Here is more information: https://nightlie

Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-30 Thread Robert Metzger
@Matthias Pohl : I've also been annoyed by this 30 days limit, but I'm not aware of a way to globally change the default. I would ask in #asfinfra in the asf slack. On Thu, Sep 30, 2021 at 12:19 PM Till Rohrmann wrote: > Thanks for the hint with the managed search engines Matthias. I think this

Re: Support ARM architecture

2021-09-22 Thread Robert Metzger
Hi, afaik the only real blocker for ARM support was a rocksdb binary for arm. This has been resolved and is scheduled to be released with 1.14.0: https://issues.apache.org/jira/browse/FLINK-13598 If you have an ARM machine available, you could even help the community in the release verification p

Re: Many S3V4AuthErrorRetryStrategy warn logs while reading/writing from S3

2021-09-22 Thread Robert Metzger
Hey Andreas, This could be related too https://github.com/apache/hadoop/pull/110/files#diff-0a2e55a2f79ea4079eb7b77b0dc3ee562b383076fa0ac168894d50c80a95131dR950 I guess in Flink this would be s3.endpoint: your-endpoint-hostname Where your-endpoint-hostname is a region-specific endpoint, which y

Re: flink rest endpoint creation failure

2021-09-22 Thread Robert Metzger
Hi, Yes, "rest.bind-port" seems to be set to "35485" on the JobManager instance. Can you double check the configuration that is used by Flink? The jobManager is also printing the effective configuration on start up. You'll probably see the value there as well. On Wed, Sep 22, 2021 at 6:48 PM Cur

Re: Unbounded Kafka Source

2021-09-22 Thread Robert Metzger
Hi, What happens if you do not set any boundedness on the KafkaSource? For a DataStream job in streaming mode, the Kafka source should be unbounded. >From reading the code, it seems that setting unbounded(latest) should not trigger the behavior you mention ... but the Flink docs are not clearly w

Re: Flink Performance Issue

2021-09-22 Thread Robert Metzger
Hi Kamaal, I would first suggest understanding the performance bottleneck, before applying any optimizations. Idea 1: Are your CPUs fully utilized? if yes, good, then scaling up will probably help If not, then there's another inefficiency Idea 2: How fast can you get the data into your job, with

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-09 Thread Robert Metzger
d. > > Just FYI - We are using Fixed Delay Restart (5 times, 10s delay) > > On Thu, Sep 9, 2021 at 4:29 PM Robert Metzger wrote: > >> Hi Puneet, >> >> Can you provide us with the JobManager logs of this incident? Jobs should >> not disappear, they should res

Re: Issue while creating Hive table from Kafka topic

2021-09-09 Thread Robert Metzger
ner.java:84) > at > org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70) > at > org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$0(JarRunHandler.java:102) > at > java.util.concurrent.CompletableFu

Re: Job crashing with RowSerializer EOF exception

2021-09-09 Thread Robert Metzger
Hi Yuval, EOF exceptions during serialization are usually an indication that some serializers in the serializer chain is somehow broken. What data type are you serializating? Does it include some type serializer by a custom serializer, or Kryo, ... ? On Thu, Sep 9, 2021 at 4:35 PM Yuval Itzchakov

Re: Issue while creating Hive table from Kafka topic

2021-09-09 Thread Robert Metzger
t; > On Thu, Sep 9, 2021 at 4:26 PM Robert Metzger wrote: > >> Hey, >> >> Why do you have these dependencies in your pom? >> >> >> >> org.apache.kafka >> kafka-clients >> 2.8

Re: Allocation-preserving scheduling and task-local recovery

2021-09-09 Thread Robert Metzger
Hi, from my understanding of the code [1], the task scheduling first considers the state location, and then uses the evenly spread out scheduling strategy as a fall back. So in my understanding of the code, the local recovery should have preference over the evenly spread out strategy. If you can e

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-09 Thread Robert Metzger
Hi Puneet, Can you provide us with the JobManager logs of this incident? Jobs should not disappear, they should restart on other Task Managers. On Wed, Sep 8, 2021 at 3:06 PM Puneet Duggal wrote: > Hi, > > So for past 2-3 days i have been looking for documentation which > elaborates how flink t

Re: Issue while creating Hive table from Kafka topic

2021-09-09 Thread Robert Metzger
Hey, Why do you have these dependencies in your pom? org.apache.kafka kafka-clients 2.8.0 org.apache.kafka kafka_2.12 2.8.0 They are not needed for using the Kafka connector of

Re: Job manager crash

2021-09-09 Thread Robert Metzger
Is the kubernetes server you are using particularly busy? Maybe these issues occur because the server is overloaded? "Triggering checkpoint 2193 (type=CHECKPOINT) @ 1630681482667 for job ." "Completed checkpoint 2193 for job (474 byt

Re: custom flink image error

2021-08-04 Thread Robert Metzger
Hey Joshua, Can you first validate if the docker image you've built is valid by running it locally on your machine? I would recommend putting the s3 filesystem files into the plugins [1] directory to avoid classloading issues. Also, you don't need to build custom images if you want to use build-i

Re: Topic assignment across Flink Kafka Consumer

2021-08-04 Thread Robert Metzger
>> >> The partitions are assigned equally if we are reading from a single topic. >> >> Our Use case is to read from multiple topics [topics r4 regex pattern] we >> use 6 topics and 1 partition per topic for this job. >> >> In this case , few of the kafka consu

Re: Savepoint class refactor in 1.11 causing restore from 1.9 savepoint to fail

2021-08-04 Thread Robert Metzger
untime? > > > > > > > > *From: *Robert Metzger > *Date: *Tuesday, August 3, 2021 at 11:52 AM > *To: *Weston Woods > *Cc: *"user@flink.apache.org" > *Subject: *Re: Savepoint class refactor in 1.11 causing restore from 1.9 > savepoint to fail > > &g

Re: Checkpoints fail when trying to copy from Kafka to S3 since "some tasks have been finished"

2021-08-03 Thread Robert Metzger
Hi Svend, I'm a bit confused by this statement: * In sreaming mode, with checkpoing but removing the `setBounded()` on the > kafka source yields the same result My expectation would be that the source runs forever, if it is not bounded. Are you sure this error message is not coming from another

Re: Cleaning old incremental checkpoint files

2021-08-03 Thread Robert Metzger
Hi Robin, Let's say you have two checkpoints #1 and #2, where #1 has been created by an old version or your job, and #2 has been created by the new version. When can you delete #1? In #1, there's a directory "/shared" that contains data that is also used by #2, because of the incremental nature of

Re: Savepoint class refactor in 1.11 causing restore from 1.9 savepoint to fail

2021-08-03 Thread Robert Metzger
Hi Weston, I haven never looked into the savepoint migration code paths myself, but I know that savepoint migration across multiple versions is not supported (1.9 can only migrate to 1.10, not 1.11). We have test coverage for these migrations, and I would be surprised if this "Savepoint" class migr

Re: Repository for all mailing list

2021-08-03 Thread Robert Metzger
Hey, generally, the mailing lists (and JIRA) are indexed by search engines, in particular Google. As long as you have a specific enough search string (such as an exception message), you should find past problems and solutions. You can also download the entire Flink mailing list archives. For exam

Re: Issue with writing to Azure blob storage using StreamingFileSink and FileSink

2021-08-03 Thread Robert Metzger
Hey Sudhanva, Have you configured IntelliJ to include dependencies with "Provided" Scope when executing your main method? I also noticed that you are using Flink 1.13.1 and 1.13.0 in your pom. its probably not an issue in this case, but it can cause problems. On Fri, Jul 30, 2021 at 10:29 AM Sudh

Re: Support for Microseconds in Avro Deserialization

2021-08-02 Thread Robert Metzger
Hey Joe, thanks a lot for reaching out regarding this. I have no explanation for why this exists, but since there's not ticket about this yet, I filed one: https://issues.apache.org/jira/browse/FLINK-23589 I also pinged some committers who can hopefully provide some additional context. I would pro

Re: OOM Metaspace after multiple jobs

2021-07-20 Thread Robert Metzger
Hi Alexis, I hope I'm not stating the obvious, but have you checked this documentation page: https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/#unloading-of-dynamically-loaded-classes-in-user-code In particular the shutdown hooks we've introduced in F

Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-20 Thread Robert Metzger
+1 to this change! When I was working on the reactive mode blog post [1] I also ran into this issue, leading to a poor "out of the box" experience when scaling down. For my experiments, I've chosen a timeout of 8 seconds, and the cluster has been running for 76 days (so far) on Kubernetes. I also

Re: Some question of RocksDB state backend on ARM os

2021-07-20 Thread Robert Metzger
our reply. > > Will this feature be released in version 1.14? > > Best, > > Hui > > *发件人:* Robert Metzger [mailto:rmetz...@apache.org] > *发送时间:* 2021年7月20日 19:45 > *收件人:* Wanghui (HiCampus) > *抄送:* user@flink.apache.org > *主题:* Re: Some question of RocksDB state

Re: Topic assignment across Flink Kafka Consumer

2021-07-20 Thread Robert Metzger
Hi Prasanna, which Flink version and Kafka connector are you using? (the "KafkaSource" or "FlinkKafkaConsumer"?) The partition assignment for the FlinkKafkaConsumer is defined here: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/st

Re: Some question of RocksDB state backend on ARM os

2021-07-20 Thread Robert Metzger
The RocksDB version provided by Flink does not currently run on ARM. However, there are some efforts / hints: - https://stackoverflow.com/a/44573013/568695 - https://issues.apache.org/jira/browse/FLINK-13448 - https://issues.apache.org/jira/browse/FLINK-13598 I would recommend voting and commenti

Re: Flink RocksDB Performance

2021-07-20 Thread Robert Metzger
Your understanding of the problem is correct -- the serialization cost is the reason for the high CPU usage. What you can also try to optimize is the serializers you are using (by using data types that are efficient to serialize). See also this blog post: https://flink.apache.org/news/2020/04/15/f

Re: Subpar performance of temporal joins with RocksDB backend

2021-07-20 Thread Robert Metzger
Are you using remote disks for rocksdb? (I guess that's EBS on AWS) Afaik there are usually limitations wrt to the IOPS you can perform. I would generally recommend measuring where the bottleneck is coming from. It could be that your CPUs are at 100%, then adding more machines / cores will help (m

Re: multiple jobs in same flink app

2021-06-23 Thread Robert Metzger
ot; job finishes. Using > executeAsync(), which is non-blocking, will lead to the "next" job starting > before "this" job finishes.* > > > On Tue, Jun 22, 2021 at 11:13 PM Robert Metzger > wrote: > >> Hi Qihua, >> >> Application Mode

Re: multiple jobs in same flink app

2021-06-22 Thread Robert Metzger
Hi Qihua, Application Mode is meant for executing one job at a time, not multiple jobs on the same JobManager. If you want to do that, you need to use session mode, which allows managing multiple jobs on the same JobManager. On Tue, Jun 22, 2021 at 10:43 PM Qihua Yang wrote: > Hi Arvid, > > Do

Re: Flink parameter configuration does not take effect

2021-06-17 Thread Robert Metzger
for users, I think these instructions can appear > in the Configuration of the official document. > > Best, > Jason > > JasonLee1781 > jasonlee1...@163.com > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=JasonLee1781&uid=jas

Re: Resource Planning

2021-06-16 Thread Robert Metzger
> immutable memtables metrics increased not as fast as before the > backpressure. > > I can provide more context if you are interested. We are still debugging > on this issue. > > Rommel > > > > > > On Wed, Jun 16, 2021 at 4:25 AM Robert Metzger > wrote: >

Re: RocksDB CPU resource usage

2021-06-16 Thread Robert Metzger
to test this hypothesis by > making the same comparison with some simpler state inside the aggregation > window. > > On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, wrote: > >> Depending on the datatypes you are using, seeing 3x more CPU usage seems >> realistic. >>

Re: Flink PrometheusReporter support for HTTPS

2021-06-16 Thread Robert Metzger
It seems like the PrometheusReporter doesn't support HTTPS. The Flink reporter seems to be based on the HttpServer prometheus client. I wonder if using the servlet client would allow us to add HTTPS support: https://github.com/prometheus/client_java/blob/master/simpleclient_servlet/src/main/java/i

Re: Confusions and suggestions about Configuration

2021-06-16 Thread Robert Metzger
Note to others on this mailing list. This email has also been sent with the subject "Flink parameter configuration does not take effect" to this list. I replied there, let's also discuss there. On Tue, Jun 15, 2021 at 7:39 AM Jason Lee wrote: > Hi everyone, > > When I was researching and using

Re: Save state on a CoGroupFunction and recover it after a failure

2021-06-16 Thread Robert Metzger
Hi Felipe, Which data source are you using? > Then, in the MyCoGroupFunction there are only events of stream02 Are you storing events in your state? > Is this the case where I have to use RichCoGroupFunction and save the state by implementing the CheckpointedFunction? If you want your state to

Re: TypeInfo issue with Avro SpecificRecord

2021-06-16 Thread Robert Metzger
Thanks a lot for sharing the solution on the mailing list and in the ticket. On Tue, Jun 15, 2021 at 11:52 AM Patrick Lucas wrote: > Alright, I figured it out—it's very similar to FLINK-13703, but instead of > having to do with immutable fields, it's due to use of the Avro Gradle > plugin option

Re: RocksDB CPU resource usage

2021-06-16 Thread Robert Metzger
Depending on the datatypes you are using, seeing 3x more CPU usage seems realistic. Serialization can be quite expensive. See also: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html Maybe it makes sense to optimize there a bit. On Tue, Jun 15, 2021 at 5:23 PM JING ZHAN

Re: Please advise bootstrapping large state

2021-06-16 Thread Robert Metzger
Hi Marco, The DataSet API will not run out of memory, as it spills to disk if the data doesn't fit anymore. Load is distributed by partitioning data. Giving you advice depends a bit on the use-case. I would explore two major options: a) reading the data from postgres using Flink's SQL JDBC connec

Re: Flink SQL as DSL for flink CEP

2021-06-16 Thread Robert Metzger
Hi Dipanjan, Using Flink SQL's MATCH_RECOGNIZE operator is certainly a good idea if you are looking for a non-programmatic way to do CEP with Flink. On Wed, Jun 16, 2021 at 6:44 AM Dipanjan Mazumder wrote: > Hi, > > Can we say that Flink SQL is kind of a DSL overlay on flink CEP , i > mean

Re: S3 + Parquet credentials issue

2021-06-16 Thread Robert Metzger
Thanks for the logs. The OK job seems to read from "s3a://test-bucket/", while the KO job reads from "s3a://bucket-test/". Could it be that you are just trying to access the wrong bucket? What I also found interesting from the KO Job TaskManager is this log message: Caused by: java.net.NoRouteTo

Re: Resource Planning

2021-06-16 Thread Robert Metzger
Hi Thomas, My gut feeling is that you can use the available resources more efficiently. What's the size of a checkpoint for your job (you can see that from the UI)? Given that your cluster has has an aggregate of 64 * 12 = 768gb of memory available, you might be able to do everything in memory (

Re: Flink parameter configuration does not take effect

2021-06-16 Thread Robert Metzger
Hi Jason, How are you deploying your Flink SQL tasks? (are you using per-job/application clusters, or a session cluster? ) I agree that the configuration management is not optimal in Flink. By default, I would recommend assuming that all configuration parameters are cluster settings, which requir

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Robert Metzger
Hi Yidan, it seems that the attachment did not make it through the mailing list. Can you copy-paste the text of the exception here or upload the log somewhere? On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote: > Attachment is the exception stack from flink's web-ui. Does anyone > have also met

Re: [VOTE] Release 1.13.1, release candidate #1

2021-05-28 Thread Robert Metzger
+1 (binding) - Tried out reactive mode in from the scala 2.11 binary locally (with scale up & stop with savepoint) - reviewed website update - randomly checked a jar file in the staging repo (flink-python jar looks okay (I just checked superifically)) On Fri, May 28, 2021 at 5:16 AM Leonard Xu

Re: Choice of time characteristic and performance

2021-05-22 Thread Robert Metzger
Hi Bob, if you don't need any time characteristics, go with processing time. Ingestion time will call System.currentTimeMillis() on every incoming record, which is an somewhat expensive call. Event time (and ingestion time) will attach a long field to each record, making the records 8 bytes larger

Re: Stop command failure

2021-05-22 Thread Robert Metzger
Hi, can you provide the jobmanager log of that run? it seems that the operation timed out. The JobManager log will help us to give some insights into the root cause. On Tue, May 18, 2021 at 1:42 PM V N, Suchithra (Nokia - IN/Bangalore) < suchithra@nokia.com> wrote: > Hi, > > > > Stop command

Re: Fastest way for decent lookup JOIN?

2021-05-22 Thread Robert Metzger
Hi Theo, Since you are running Flink locally it would be quite easy to attach a profiler to Flink to see where most of the CPU cycles are burned (or: check if you are maybe IO bound?) .. this could provide us with valuable data on deciding for the next steps. On Tue, May 18, 2021 at 5:26 PM Theo

Re: Parallelism in Production: Best Practices

2021-05-22 Thread Robert Metzger
Hi Yaroslav, My recommendation is to go with the 2nd pattern you've described, but I only have limited insights into real world production workloads. Besides the parallelism configuration, I also recommend looking into slot sharing groups, and maybe disabling operator chaining. I'm pretty sure so

Re: Savepoint/checkpoint confusion

2021-05-22 Thread Robert Metzger
? > > On Thu, 20 May 2021 at 05:35, Robert Metzger wrote: > >> Hey Igor, >> >> 1) yes, reactive mode indeed does the same. >> 2) No, HA mode is only storing some metadata in ZK about the leadership >> and latest checkpoints, but the checkpoints itself are the sam

Re: Savepoint/checkpoint confusion

2021-05-20 Thread Robert Metzger
Hey Igor, 1) yes, reactive mode indeed does the same. 2) No, HA mode is only storing some metadata in ZK about the leadership and latest checkpoints, but the checkpoints itself are the same. They should be usable for a changed job graph (if the state matches the operators by setting the UUIDs [1]

Re: taskmanager initialization failed

2021-05-17 Thread Robert Metzger
Hi Suchithra, this is very likely a version mixup: Can you make sure all jars in your classpath are Flink 1.11.1? On Mon, May 17, 2021 at 2:05 PM V N, Suchithra (Nokia - IN/Bangalore) < suchithra@nokia.com> wrote: > Hi, > > > > With flink 1.11.1 version, taskmanager initialization is failing

Re: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time,

2021-05-05 Thread Robert Metzger
Hi Ragini, Since this exception is coming from the Hbase client, I assume the issue has nothing to do with Flink directly. I would recommend carefully studying the HBase client configuration parameters, maybe setup a simple Java application that "hammers" data into Hbase at a maximum rate to under

Re: Interacting with flink-jobmanager via CLI in separate pod

2021-05-05 Thread Robert Metzger
t > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > ~[flink-dist_2.12-1.13.0.jar:1.13.0] > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >

Re: Interacting with flink-jobmanager via CLI in separate pod

2021-05-05 Thread Robert Metzger
Hi, can you check the client log in the "log/" directory? The Flink client will try to access the K8s API server to retrieve the endpoint of the jobmanager. For that, the pod needs to have permissions (through a service account) to make such calls to K8s. My hope is that the logs or previous messag

Re: [ANNOUNCE] Apache Flink 1.13.0 released

2021-05-04 Thread Robert Metzger
Thanks a lot to everybody who has contributed to the release, in particular the release managers for running the show! On Tue, May 4, 2021 at 8:54 AM Konstantin Knauf wrote: > Thank you Dawid and Guowei! Great job everyone :) > > On Mon, May 3, 2021 at 7:11 PM Till Rohrmann wrote: > >> This is

Re: Correctly serializing "Number" as state in ProcessFunction

2021-04-26 Thread Robert Metzger
Quick comment on the kryo type registration and the messages you are seeing: The messages are expected: What the message is saying is that we are not serializing the type using Flink's POJO serializer, but we are falling back to Kryo. Since you are registering all the instances of Number that you a

Re: Flink missing Kafka records

2021-04-26 Thread Robert Metzger
Hi Dan, Can you describe under which conditions you are missing records (after a machine failure, after a Kafka failure, after taking and restoring from a savepoint, ...). Are many records missing? Are "the first records" or the "latest records" missing? Any individual records missing, or larger b

Re: The wrong Options of Kafka Connector, will make the cluster can not run any job

2021-04-26 Thread Robert Metzger
Thanks a lot for your message. This could be a bug in Flink. It seems that the archival of the execution graph is failing because some classes are unloaded. What I observe from your stack traces is that some classes are loaded from flink-dist_2.11-1.11.2.jar, while other classes are loaded from te

Re: kafka consumers partition count and parallelism

2021-04-25 Thread Robert Metzger
Hey Prashant, the Kafka Consumer parallelism is constrained by the number of partitions the topic(s) have. If you have configured the Kafka Consumer in Flink with a parallelism of 100, but your topic has only 20 partitions, 80 consumer instances in Flink will be idle. On Mon, Apr 26, 2021 at 2:54

Re: Checkpoint error - "The job has failed"

2021-04-25 Thread Robert Metzger
Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill wrote: > My Flink job failed to checkpoint with a "The job has failed" error. The > logs contained no other recent e

Re: Running Apache Flink 1.12 on Apple Silicon M1 MacBook Pro

2021-04-17 Thread Robert Metzger
egards > Klemens > > > Am 15.04.2021 um 21:29 schrieb Robert Metzger : > > Hi, > > a DEBUG log of the client would indeed be nice. > Can you adjust this file: > > conf/log4j-cli.properties > > to the f

Re: Question about state processor data outputs

2021-04-15 Thread Robert Metzger
based on > https://www.alibabacloud.com/blog/deep-insights-into-flink-sql-flink-advanced-tutorials_596628. > If it's doable, then I'll be able to solve our problem with applying > streamfilesink to the transformed dataset. > > Best wishes, > Chen-Che Huang > > On 2021

Re: Running Apache Flink 1.12 on Apple Silicon M1 MacBook Pro

2021-04-15 Thread Robert Metzger
> test the Flink Pipelines until Silicon is supported. > > Nevertheless, thanks for your answer. If there is anything I can provide > you to narrow down the problem, I am happy to help. > > Regards > Klemens > > Am 15.04.2021 um 20:59 schrieb Robert Metzger : > >

Re: Question about state processor data outputs

2021-04-15 Thread Robert Metzger
Hey Chen-Che Huang, I guess the StreamingFileSink is what you are looking for. It is documented here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html I drafted a short example (that is not production ready), which does roughly what you are asking for: htt

Re: Running Apache Flink 1.12 on Apple Silicon M1 MacBook Pro

2021-04-15 Thread Robert Metzger
Hey Klemens, I'm sorry that you are running into this. Looks like you are the first (of probably many people) who use Flink on a M1 chip. If you are up for it, we would really appreciate a fix for this issue, as a contribution to Flink. Maybe you can distill the problem into an integration test,

Re: Flink Hadoop config on docker-compose

2021-04-15 Thread Robert Metzger
Hi, I'm not aware of any known issues with Hadoop and Flink on Docker. I also tried what you are doing locally, and it seems to work: flink-jobmanager| 2021-04-15 18:37:48,300 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Starting StandaloneSessionClusterEntrypoint.

Re: Flink Taskmanager failure recovery and large state

2021-04-06 Thread Robert Metzger
Hey Yaroslav, GCS is a somewhat popular filesystem that should work fine with Flink. It seems that the initial scale of a bucket is 5000 read requests per second (https://cloud.google.com/storage/docs/request-rate), your job should be at roughly the same rate (depending on how fast your job resta

Re: Checkpoint timeouts at times of high load

2021-04-06 Thread Robert Metzger
It could very well be that your job gets stuck in a restart loop for some reason. Can you either post the full TaskManager logs here, or try to figure out yourself why the first checkpoint that timed out, timed out? Backpressure or blocked operators are a common cause for this. In your case, it cou

Re: Question about checkpoints and savepoints

2021-03-29 Thread Robert Metzger
1.953gb (2097152000 bytes) > INFO [] - Off-heap: 128.000mb (134217728 bytes) > INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) > INFO [] - JVM Overhead: 264.889mb (277756136 bytes) > > > On Fri, Mar 26, 2021 at 4:03 AM Robert Metzger &

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-03-29 Thread Robert Metzger
, 2020 at 10:57 PM > >>>> >> To: dev mailto:d...@flink.apache.org>>, user > < > >>>> >> user@flink.apache.org<mailto:user@flink.apache.org>> > >>>> >> Cc: Lasse Nedergaard >>>> >> lassenedergaardfl...@gmail.c

Re: Hadoop is not in the classpath/dependencies

2021-03-26 Thread Robert Metzger
Hey Matthias, Maybe the classpath contains hadoop libraries, but not the HDFS libraries? The "DistributedFileSystem" class needs to be accessible to the classloader. Can you check if that class is available? Best, Robert On Thu, Mar 25, 2021 at 11:10 AM Matthias Seiler < matthias.sei...@campus.t

Re: reading from jdbc connection

2021-03-26 Thread Robert Metzger
Hey Arran, It seems that the preferred way, even in the Java API is to use a DDL statement: https://github.com/apache/flink/blob/master/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/api/bridge/java/StreamTableEnvironment.java#L602-L639 Hope this helps! Best, Rober

Re: FlinkKafkaConsumer - Broadcast - Initial Load

2021-03-26 Thread Robert Metzger
Hey Sandeep, (Maybe this thread is also relevant: https://lists.apache.org/thread.html/7d56267d4c2344ccb5a774896682d0a3efb38c1c215ef3500c3569a2%40%3Cuser.flink.apache.org%3E ) > My question is how do I initialise the pipeline for the first set of records in the database? i.e. those that are not C

  1   2   3   4   5   6   7   8   9   10   >