Re: User Classpath from Plugin

2021-07-13 Thread Chesnay Schepler
You can't access the user classpath from plugins. On 14/07/2021 00:18, Mason Chen wrote: I've read this page (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/plugins/

[no subject]

2021-07-13 Thread guo shiguang

[External] NullPointerException on accumulator after Checkpointing

2021-07-13 Thread Clemens Valiente
Hi, we created a new AggregateFunction with Accumulator as Mapview as follows class CountDistinctAggFunction[T] extends AggregateFunction[lang.Integer, MapView[T, lang.Integer]] { override def createAccumulator(): MapView[T, lang.Integer] = { new MapView[T, lang.Integer]() } ... We had

Re: java.lang.Exception: Could not complete the stream element: Record @ 1626200691540 :

2021-07-13 Thread Rahul Patwari
Hi Ragini, >From the stack trace, the job failed as the Async I/O Operator has timed out for an event. The timeout is configurable. Please refer https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api Quoting from above documentation: > Timeout

Re: Kafka Consumer Retries Failing

2021-07-13 Thread Rahul Patwari
Thanks, David, Piotr for your reply. I managed to capture the Thread dump from Jobmanaager UI for few task managers. Here is the thread dump for Kafka Source tasks in one task manager. I could see the same stack trace in other task managers as well. It seems like Kafka Source tasks are waiting on

java.lang.Exception: Could not complete the stream element: Record @ 1626200691540 :

2021-07-13 Thread Ragini Manjaiah
Hi , I am facing the below issue while processing streaming events. In what scenarios hit with java.lang.Exception: Could not complete the stream element. can please help me here . The job fails after this exception is hit 2021-07-13 13:24:58,781 INFO org.apache.flink.runtime.executiongraph.Exec

Re: Stateful Functions PersistentTable duration

2021-07-13 Thread Ammon Diether
Thank you for asking. I meant PersistedTable -> https://github.com/apache/flink-statefun/blob/release-3.0/statefun-sdk-embedded/src/main/java/org/apache/flink/statefun/sdk/state/PersistedTable.java It is related to state backend. I am using rocksdb backend. On Tue, Jul 13, 2021 at 8:53 PM Caiz

Re: Stateful Functions PersistentTable duration

2021-07-13 Thread Caizhi Weng
Hi By PersistentTable do you mean state backend? If yes, the answer differs with different operators and state backends. For keyed states the duration is for per key. However the exact time to clean up a key really depends on the operator and the state backend. Most operators will register a time

Re: Flink UDF Scalar Function called only once for all rows of select SQL in case of no parameter passed

2021-07-13 Thread JING ZHANG
Hi, Shamit Jain, In fact, it is an optimization to simplify expression. If a Udf has no parameters, optimizer would be look it as an expression which always generate constants results. So it would be calculated once in optimization phase instead of run by per record in. runtime phase. The optimiza

Re: Flink UDF Scalar Function called only once for all rows of select SQL in case of no parameter passed

2021-07-13 Thread Caizhi Weng
Hi! You should override the isDeterministic() method and return false. The default return value of this method is true. >From the java doc of this method: > Furthermore, return false if the planner should always execute this > function on the cluster side. In other words: the planner should not

Re: Kafka Consumer stop consuming data

2021-07-13 Thread Jerome Li
Hi Aeden, Thanks for getting back. Do you mean one of the partitions is in idle state and not new watermark generated from there and then it stunk all the downsteams and stop consuming data from Kafka? I didn’t use watermark in my application through. I checked that all the Kafka partition has

Re: Running two versions of Flink with testcontainers

2021-07-13 Thread Austin Cawley-Edwards
Great, glad it worked out for you! On Tue, Jul 13, 2021 at 10:32 AM Farouk wrote: > Thanks > > Finally I tried by running docker commands (thanks for the documentation) > and it works fine. > > Thanks > Farouk > > Le mar. 13 juil. 2021 à 15:48, Austin Cawley-Edwards < > austin.caw...@gmail.com>

Re: Kafka Consumer stop consuming data

2021-07-13 Thread Aeden Jameson
This can happen if you have an idle partition. Are all partitions receiving data consistently? On Tue, Jul 13, 2021 at 2:59 PM Jerome Li wrote: > > Hi, > > > > I got question about Flink Kafka consumer. I am facing the issue that the > Kafka consumer somehow stop consuming data from Kafka after

Re: User Classpath from Plugin

2021-07-13 Thread Mason Chen
I've read this page ( https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/plugins/), but would like to know more about modifying the whitelist so I can read the class. On Tue, Jul 13, 2021 at 2:54 PM Mason Chen wrote: > Hi all, > > How can I read the user cla

Kafka Consumer stop consuming data

2021-07-13 Thread Jerome Li
Hi, I got question about Flink Kafka consumer. I am facing the issue that the Kafka consumer somehow stop consuming data from Kafka after start for few minutes or after few hours. While stopping, I checked the backpressure and cpu and memory consumption. It all looks like not data consuming ins

User Classpath from Plugin

2021-07-13 Thread Mason Chen
Hi all, How can I read the user classpath from a Flink plugin (e.g. one of the metric reporters)? Best, Mason

Stateful Functions PersistentTable duration

2021-07-13 Thread Ammon Diether
Question If the duration is 20 minutes, 1) is the duration per item? 2) or is the duration for the table as a whole? Suppose the following items ("a", "a-value") 30 minutes ago ("b", "b-value") 10 minutes ago Does "a" get cleaned up? or neither gets cleaned up yet because the most recent item is

Re: savepoint failure

2021-07-13 Thread Dan Hill
Could this be caused by mixing of configuration settings when running? Running a job with one parallelism, stop/savepointing and then recovering with a different parallelism? I'd assume that's fine and wouldn't put create bad state. On Tue, Jul 13, 2021 at 12:34 PM Dan Hill wrote: > I checked m

Re: savepoint failure

2021-07-13 Thread Dan Hill
I checked my code. Our keys for streams and map state only use either (1) string, (2) long IDs that don't change or (3) Tuple of 1 and 2. I don't know why my current case is breaking. Our job partitions and parallelism settings have not changed. On Tue, Jul 13, 2021 at 12:11 PM Dan Hill wrot

How to examine Flink state in RocksDB with sst_dump

2021-07-13 Thread Eleanore Jin
Hi experts, I am running the flink application as local execution mode for testing. I have configured RocksDB as state backend, and I would like to use rocksDB tools such as ldb or sst_dump to examine how exactly the state is stored. However, I encountered below error, can you please advice me h

Re: savepoint failure

2021-07-13 Thread Dan Hill
Hey. I just hit a similar error in production when trying to savepoint. We also use protobufs. Has anyone found a better fix to this? On Fri, Oct 23, 2020 at 5:21 AM Till Rohrmann wrote: > Glad to hear that you solved your problem. Afaik Flink should not read the > fields of messages and call

Flink UDF Scalar Function called only once for all rows of select SQL in case of no parameter passed

2021-07-13 Thread shamit jain
Hi, I am facing an issue where scalar UDF called once in case if no parameter passed as given below:- public class DateTimeToEpochUDF extends ScalarFunction { public Long eval() { System.out.print("Test"); return Instant.now().toEpochMilli(); } } Now if

Re: Apache Flink - Reading Avro messages from Kafka with schema in schema registry

2021-07-13 Thread Dawid Wysakowicz
Hi, Yes, you are right the schema in the forGeneric is the readerSchema and at the same time the schema that Flink will be working with in the pipeline. It will be the schema used to serialize and deserialize records between different TaskManagers. Between the Flink TaskManagers that schema plays

Re: Process finite stream and notify upon completion

2021-07-13 Thread Piotr Nowojski
Hi, Sources when finishing are emitting {{org.apache.flink.streaming.api.watermark.Watermark#MAX_WATERMARK}}, so I think the best approach is to register an even time timer for {{Watermark#MAX_WATERMARK}} or maybe {{Watermark#MAX_WATERMARK - 1}}. If your function registers such a timer, it would b

Re: Running two versions of Flink with testcontainers

2021-07-13 Thread Austin Cawley-Edwards
You might also be able to put them in separate networks[1] to get around changing all the ports and still ensuring that they don't see eachother. [1]: https://www.testcontainers.org/features/networking#advanced-networking On Tue, Jul 13, 2021 at 9:07 AM Chesnay Schepler wrote: > It is possible

Re: Kafka Consumer Retries Failing

2021-07-13 Thread Piotr Nowojski
Hi, I'm not sure, maybe someone will be able to help you, but it sounds like it would be better for you to: - google search something like "Kafka Error sending fetch request TimeoutException" (I see there are quite a lot of results, some of them might be related) - ask this question on the Kafka m

Re: How to register custormize serializer for flink kafka format type

2021-07-13 Thread Piotr Nowojski
Hi, It's mentioned in the docs [1], but unfortunately this is not very well documented in 1.10. In short you have to provide a custom implementation of a `DeserializationSchemaFactory`. Please look at the built-in factories for examples of how it can be done. In newer versions it's both easier an

Re: Running two versions of Flink with testcontainers

2021-07-13 Thread Chesnay Schepler
It is possible but you need to make sure that all ports a configured such that the 2 clusters don't see each other. On 13/07/2021 13:21, Farouk wrote: Hi For e2e testing, we run tests with testcontainers. We have several jobs and we want to upgrade them one by one Do you know if it is possi

Re: Savepoints with bootstraping a datastream function

2021-07-13 Thread Rakshit Ramesh
That's great news! Thanks. On Tue, 13 Jul 2021 at 14:17, Arvid Heise wrote: > The only known workaround is to provide your own source(function) that > doesn't finish until all of the source subtasks finish and a final > checkpoint is completed. However, coordinating the sources with the old > So

Running two versions of Flink with testcontainers

2021-07-13 Thread Farouk
Hi For e2e testing, we run tests with testcontainers. We have several jobs and we want to upgrade them one by one Do you know if it is possible in Docker to run one JM + one TM for version 1 and version 2 at the same time? It looks like either the taskmanager registration is failing for the seco

Re: Flink 1.13 Native Kubernetes - Custom Pod Templates

2021-07-13 Thread Yang Wang
> > Are all the commands applicable with Kubernetes integration like taking > savepoint, starting from savepoint. I see commands from here [1] for > savepoint and on yarn as well, nothing specific to kubernetes. Yes. Taking savepoint, starting from savepoint are also available for native Kubernet

Re: key_by problem in Pyflink

2021-07-13 Thread Fei Zhao
Hi, Thanks for your explanation! Adding a line `self.data.contains('xxx')` in the `process_element2` and all goes well. I will take this as my temporary solution. Looking forward to the next release. Best Regards, Fei Xingbo Huang 于2021年7月13日周二 下午4:18写道: > Hi, > > I have created the JIRA[1]

[DISCUSS] FLIP-184: Refine ShuffleMaster lifecycle management for pluggable shuffle service framework

2021-07-13 Thread Yingjie Cao
Hi devs and users, This topic was originally discussed and reached a consensus in [1]. Because the change touches the pluggable shuffle interface, though not annotated as public currently, some users may be using it already. To avoid bring compatibility issues to customized shuffle plugins already

Re: Savepoints with bootstraping a datastream function

2021-07-13 Thread Arvid Heise
The only known workaround is to provide your own source(function) that doesn't finish until all of the source subtasks finish and a final checkpoint is completed. However, coordinating the sources with the old SourceFunction interface requires some external tool. FLIP-147 is targeted for 1.14 in A

Re: key_by problem in Pyflink

2021-07-13 Thread Xingbo Huang
Hi, I have created the JIRA[1] to fix this bug which will be included in release-1.13.2. The root cause is the wrong mapping of the state key to the state. This kind of wrong mapping occurs when the key is switched, but the state is not used. As you wrote in the example, the `data` you declared is

Re: ValueState is null; checkpointing issues

2021-07-13 Thread Chesnay Schepler
When your operator process a value with the key, then the ValueState is implicitly scoped to that key. So, If all your payloads have a unique ID (and hence unique key) then the ValueState will initially always be null. Only if 2 payloads have the same ID will be ValueState return something non-n

Re: Flink 1.13 Native Kubernetes - Custom Pod Templates

2021-07-13 Thread bat man
Hi Yang, One more follow-up question on the custom pod-templates for JobManager and Taskmanager - As you mention pod template is for advanced features so is it that in custom template we just need to include the custom feature like if I want to include a volume mount or sid-car. I don't have to in

Re: Flink 1.13 Native Kubernetes - Custom Pod Templates

2021-07-13 Thread bat man
Thanks Yang for the information. Are all the commands applicable with Kubernetes integration like taking savepoint, starting from savepoint. I see commands from here [1] for savepoint and on yarn as well, nothing specific to kubernetes. [1] - https://ci.apache.org/projects/flink/flink-docs-releas