Re: Problem with overridden hashCode/equals in keys in Flink 1.11.3 when checkpointing with RocksDB

2021-01-21 Thread David Haglund
A colleague of mine found some hint under “Avro type” [2] in the State evolution schema page: “Example: RocksDB state backend relies on binary objects identity, rather than hashCode method implementation. Any changes to the keys object structure could lead to non deterministic behaviour.” I gu

Re: Trying to create a generic aggregate UDF

2021-01-21 Thread Timo Walther
Hi Dylan, thanks for the investigation. I can now also reproduce it my code. Yes, this is a bug. I opened https://issues.apache.org/jira/browse/FLINK-21070 and will try to fix this asap. Thanks, Timo On 20.01.21 17:52, Dylan Forciea wrote: Timo, I converted what I had to Java, and ended u

Re: Flink Jobmanager HA deployment on k8s

2021-01-21 Thread Ufuk Celebi
@Yang: I think this would be valuable to document. I think it's a natural question to ask whether you can have standby JMs with Kubernetes. What do you think? If you agree, we could create a JIRA ticket and work on the "official" docs for this. On Thu, Jan 21, 2021, at 5:05 AM, Yang Wang wrote:

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

2021-01-21 Thread Leonard Xu
> Before the changes, as I am writing this reply, the local time here is > 2021-01-21 12:03:35 (Beijing time, UTC+8). > And I tried these 5 functions in sql client, and got: > > Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, > CURRENT_TIME; > +-+-

Re: Problem with overridden hashCode/equals in keys in Flink 1.11.3 when checkpointing with RocksDB

2021-01-21 Thread Robert Metzger
Hey David, this is a good catch! I've filed a JIRA ticket to address this in the docs more prominently: https://issues.apache.org/jira/browse/FLINK-21073 Thanks a lot for reporting this issue! On Thu, Jan 21, 2021 at 9:24 AM David Haglund wrote: > A colleague of mine found some hint under “Avr

Re: Trying to create a generic aggregate UDF

2021-01-21 Thread Timo Walther
I opened a PR. Feel free to try it out. https://github.com/apache/flink/pull/14720 Btw: >> env.createTemporarySystemFunction("LatestNonNullLong", >> classOf[LatestNonNull[Long]]) >> >> env.createTemporarySystemFunction("LatestNonNullString", >> classOf[LatestNonNull[String]]) don't make a diff

Re: org.apache.flink.runtime.client.JobSubmissionException: Job has already been submitted

2021-01-21 Thread Robert Metzger
Thanks a lot for your message. Why is there a difference of 5 minutes between the timestamp of the job submission from the client to the timestamp on the JobManager where the submission is received? Is there any service / custom logic involved in the job submission? (e.g. a proxy in between, that

How to fix deprecation on registerTableSink

2021-01-21 Thread Flavio Pompermaier
Hello everybody, I was trying to get rid of the deprecation warnings about using BatchTableEnvironment.registerTableSink() but I don't know how to proceed. My current code does the following: BatchTableEnvironment benv = BatchTableEnvironment.create(env); benv.registerTableSink("outUsers", getFie

Re: Trying to create a generic aggregate UDF

2021-01-21 Thread Dylan Forciea
Timo, Will do! I have been patching in a change locally that I have a PR [1] out for, so if this will end up in the next 1.12 patch release, I may add this in with it once it has been approved and merged. On a side note, that PR has been out since the end of October (looks like I need to do a

Re: Test failed in flink-end-to-end-tests/flink-end-to-end-tests-common-kafka

2021-01-21 Thread Robert Metzger
Since our CI system is able to build Flink, I believe it's a local issue. Are you sure that the build is failing when you build Flink from the root directory (not calling maven from within a maven module?) On Tue, Jan 19, 2021 at 11:19 AM Smile@LETTers wrote: > Hi, > I got an error when tried t

[DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-01-21 Thread Leonard Xu
Hello, everyone I want to start the discussion of FLIP-162: Consistent Flink SQL time function behavior[1]. We’ve some initial discussion of several problematic functions in dev mail list[2], and I think it's the right time to resolve them by a FLIP. Currently some time function behaviors

Re: Trying to create a generic aggregate UDF

2021-01-21 Thread Timo Walther
Hi Dylan, I can help with a review for your PR tomorrow. In general, I would recommend to just ping people a couple of times that have been worked on the component before (see git blame) to get a review. We are all busy and need a bit of pushing from time to time ;-) Thanks, Timo On 21.01.2

RE: org.apache.flink.runtime.client.JobSubmissionException: Job has already been submitted

2021-01-21 Thread Hailu, Andreas [Engineering]
Hi Robert, I sent you an email with instructions to create an account to view the logs through our secure repository. I’ve included the JobManager and client application logs there. We have a thread pool that we use to submit multiple jobs in parallel, but in there there’s no retry logic – if

Re: Kubernetes HA Services - artifact for KubernetesHaServicesFactory

2021-01-21 Thread Ashish Nigam
It works now. Job manager is able to start. But now, I have run into another issue. It seems job manager is trying to create configmap in default namespace and namespace/service account where I run job manager does not have access to configmap GET at: https://X.X.X.X/api/v1/namespaces/default/con

JDBC connection pools

2021-01-21 Thread Marco Villalobos
Currently, my jobs that require JDBC initialize a connection in the open method directly via JDBC driver. 1. What are the established best practices for this? 2. Is it better to use a connection pool that can validate the connection and reconnect? 3. Would each operator require its own connection

Comment in source code of CoGroupedStreams

2021-01-21 Thread sudranga
Is this comment in the file flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/CoGroupedStreams.java accurate? " * Note: Right now, the groups are being built in memory so you need to ensure that they don't * get too big. Otherwise the JVM might crash." Looking at the s

Comment in source code of CoGroupedStreams

2021-01-21 Thread Sudharsan R
Is this comment in the file flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/CoGroupedStreams.java accurate? " * Note: Right now, the groups are being built in memory so you need to ensure that they don't * get too big. Otherwise the JVM might crash." Looking at the s

Re: Trying to create a generic aggregate UDF

2021-01-21 Thread Dylan Forciea
I wanted to report that I tried out your PR, and it does solve my issue. I am able to create a generic LatestNonNull and it appears to do what is expected. Thanks, Dylan Forciea On 1/21/21, 8:50 AM, "Timo Walther" wrote: I opened a PR. Feel free to try it out. https://github.com/apac

Question about setNestedFileEnumeration()

2021-01-21 Thread Billy Bain
I have a Streaming process where new directories are added daily in S3. s3://foo/bar/2021-01-18/data.gz s3://foo/bar/2021-01-19/data.gz s3://foo/bar/2021-01-20/data.gz If I start the process, it won't pick up anything other than the directories visible when the process was started. The textInput

Flink 1.11 checkpoint compatibility issue

2021-01-21 Thread Lu Niu
Hi, We recently migrated from 1.9.1 to flink 1.11 and notice the new job cannot consume from savepoint taken in 1.9.1. Here is the list of operator id and max parallelism of savepoints taken in both versions. The only code change is version upgrade. savepoint 1.9.1: ``` Id: 8a74550ce6afad759d5f1d

Re: Question about setNestedFileEnumeration()

2021-01-21 Thread Billy Bain
I sent this a little prematurely. Will the streaming process find new directories under the parent? If the input path is s3://foo.bar/ and directories are added daily, should I expect that the newly added directories+files will get processed? Thanks! Wayne On 2021/01/21 23:20:41, Billy Bain

Re: Comment in source code of CoGroupedStreams

2021-01-21 Thread Guowei Ma
Hi, I think you could try something like this firstStream .coGroup(secondStream) .where(_.id) .equalTo(_.id) .window(TumblingEventTimeWindows.of(Time.seconds(1))) .with(new MyCogroupFunction()) .uid("myCoGroup") Best, Guowei On Fri, Jan 22, 2021 at 4:33 AM Sudh

Re: Flink Jobmanager HA deployment on k8s

2021-01-21 Thread Yang Wang
Yes. I agree with you that it is valuable to document how to start multiple JobManagers in HA mode for K8s deployment. I have a ticket to track this documentation improvement[1]. [1]. https://issues.apache.org/jira/browse/FLINK-21082 Best, Yang Ufuk Celebi 于2021年1月21日周四 下午9:08写道: > @Yang: I th

Re: Kubernetes HA Services - artifact for KubernetesHaServicesFactory

2021-01-21 Thread Yang Wang
You could set config option "kubernetes.namespace" to your flink-conf ConfigMap. And then KubernetesHAService will use it to create/watch the ConfigMap. Please note the default service account has enough permission. Of course, you could also set the config option "kubernetes.service-account" to an

Re: Problem with overridden hashCode/equals in keys in Flink 1.11.3 when checkpointing with RocksDB

2021-01-21 Thread Yun Tang
Hi David, Thanks for your enthusiasm to figure out the root cause. The key difference is that RocksDB holds binary objects which are only defined by the serialized bytes while Fs/MemoryStateBackend holds objects in pojo format which are defined by the hashCode and equals. If you want to achieve

Re: org.apache.flink.runtime.client.JobSubmissionException: Job has already been submitted

2021-01-21 Thread Robert Metzger
Hey Andreas, thanks a lot for providing me with the full logs. The JobManager actually received 2 job submissions. There are 2 relevant log messages. 1. "Received JobGraph submission xxx" 2. "Submitting job" 1 is logged right after the dispatcher receives the JobGraph, before the duplicate submiss