Re: How to install Flink + YARN?

2019-12-06 Thread Pankaj Chand
Is it required to use exactly the same versions of Hadoop as the pre-bundled hadoop version? I'm using Hadoop 2.7.1 cluster with Flink 1.9.1 and the corresponding Prebundled Hadoop 2.7.5. When I submit a job using: [vagrant@node1 flink]$ ./bin/flink run -m yarn-cluster ./examples/streaming/Socke

Re: StreamingFileSink doesn't close multipart uploads to s3?

2019-12-06 Thread Li Peng
Ok I seem to have solved the issue by enabling checkpointing. Based on the docs (I'm using 1.9.0), it seemed like only StreamingFileSink.forBulkFormat() should've required checkpointing, but based on this e

StreamingFileSink doesn't close multipart uploads to s3?

2019-12-06 Thread Li Peng
Hey folks, I'm trying to get StreamingFileSink to write to s3 every minute, with flink-s3-fs-hadoop, and based on the default rolling policy, which is configured to "roll" every 60 seconds, I thought that would be automatic (I interpreted rolling to mean actually close a multipart upload to s3). B

Re: What S3 Permissions does StreamingFileSink need?

2019-12-06 Thread Li Peng
Ah, I figured it out after all, turns out it was due to KMS encryption on the bucket; needed to add KMS permissions for the IAM role, otherwise there is an unauthorized error. Thanks for your help! On Fri, Dec 6, 2019 at 2:34 AM Khachatryan Roman < khachatryan.ro...@gmail.com> wrote: > Hey Li, >

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

2019-12-06 Thread Piyush Narang
+1 from our end as well. At Criteo, we are running some Flink jobs on Mesos in production to compute short term features for machine learning. We’d love to help out and contribute on this initiative. Thanks, -- Piyush From: Till Rohrmann Date: Friday, December 6, 2019 at 8:10 AM To: dev Cc:

Re: Joining multiple temporal tables

2019-12-06 Thread Benoît Paris
Hi all! I believe this is a duplicate of another JIRA: https://issues.apache.org/jira/browse/FLINK-14200; where the query side does not accept a Table, only a TableSource (or has planner rule issues). I think in this case, the Logical Correlate extracted from the Temporal Table join transforms one

Re: Joining multiple temporal tables

2019-12-06 Thread Kurt Young
Hi Chris, If you only interest the latest data of the dimension table, maybe you can try the temporal table join: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html#operations see "Join with Temporal Table" Best, Kurt On Fri, Dec 6, 2019 at 11:13 PM Fabian Hueske wr

Re: Joining multiple temporal tables

2019-12-06 Thread Fabian Hueske
Thank you! Please let me know if the workaround works for you. Best, Fabian Am Fr., 6. Dez. 2019 um 16:11 Uhr schrieb Chris Miller : > Hi Fabian, > > Thanks for confirming the issue and suggesting a workaround - I'll give > that a try. I've created a JIRA issue as you suggested, > https://issues

Re: Joining multiple temporal tables

2019-12-06 Thread Chris Miller
Hi Fabian, Thanks for confirming the issue and suggesting a workaround - I'll give that a try. I've created a JIRA issue as you suggested, https://issues.apache.org/jira/browse/FLINK-15112 Many thanks, Chris -- Original Message -- From: "Fabian Hueske" To: "Chris Miller" Cc: "user

Re: Joining multiple temporal tables

2019-12-06 Thread Fabian Hueske
Hi Chris, Your query looks OK to me. Moreover, you should get a SQLParseException (or something similar) if it wouldn't be valid SQL. Hence, I assume you are running in a bug in one of the optimizer rules. I tried to reproduce the problem on the SQL training environment and couldn't write a query

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-06 Thread devinbost
I think there might be a bug in `.window(EventTimeSessionWindows.withGap(Time.seconds(5)))` (unless I'm just not using it correctly) because I'm able to get output when I use the simpler window `.timeWindow(Time.seconds(5))` However, I don't get any output when I used the session-based window.

Re: Flink 1.9.1 allocating more containers than needed

2019-12-06 Thread Chesnay Schepler
I would expect January. With 1.8.3 release being underway, 1.10 feature freeze coming close and, of course, Christmas, it seems unlikely that we'll manage to pump out another bugfix release in December. On 06/12/2019 15:18, eSKa wrote: Thank you for quick reply. Will wait for 1.9.2 then. I bel

Re: Flink 1.9.1 allocating more containers than needed

2019-12-06 Thread eSKa
Thank you for quick reply. Will wait for 1.9.2 then. I believe you dont have any estimates on when it can happen? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink 1.9.1 allocating more containers than needed

2019-12-06 Thread Chesnay Schepler
Note that FLINK-10848 is included in 1.9.X, but it didn't fix the issue completely. On 06/12/2019 15:10, Chesnay Schepler wrote: There are some follow-up issues that are fixed for 1.9.2; release date for that is TBD. https://issues.apache.org/jira/browse/FLINK-12342 https://issues.apache.org/

Re: Flink 1.9.1 allocating more containers than needed

2019-12-06 Thread Chesnay Schepler
There are some follow-up issues that are fixed for 1.9.2; release date for that is TBD. https://issues.apache.org/jira/browse/FLINK-12342 https://issues.apache.org/jira/browse/FLINK-13184 On 06/12/2019 15:08, eSKa wrote: Hello, recently we have upgraded our environment to from 1.6.4 to 1.9.1.

Re: Row arity of from does not match serializers.

2019-12-06 Thread Fabian Hueske
Hi, The inline lambda MapFunction produces a Row with 12 String fields (12 calls to String.join()). You use RowTypeInfo rowTypeDNS to declare the return type of the lambda MapFunction. However, rowTypeDNS is defined with much more String fields. The exception tells you that the number of fields r

Flink 1.9.1 allocating more containers than needed

2019-12-06 Thread eSKa
Hello, recently we have upgraded our environment to from 1.6.4 to 1.9.1. We started to notice similar behaviour we met in 1.6.2, which was allocating more containers on yarn then are needed by job - i think it was fixed by https://issues.apache.org/jira/browse/FLINK-10848, but that one is still exi

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

2019-12-06 Thread Till Rohrmann
Big +1 for adding a fully working e2e test for Flink's Mesos integration. Ideally we would have it ready for the 1.10 release. The lack of such a test has bitten us already multiple times. In general I would prefer to use the official image if possible since it frees us from maintaining our own cu

Re: Basic question about flink programms

2019-12-06 Thread KristoffSC
Hi, Im having the same problem now. What is your approach now after gaining some experience? Also do you use Spring DI to setup/initialize your jobs/process functions? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: [DISCUSS] Drop Kafka 0.8/0.9

2019-12-06 Thread Benchao Li
+1 for dropping. Zhenghua Gao 于2019年12月5日周四 下午4:05写道: > +1 for dropping. > > *Best Regards,* > *Zhenghua Gao* > > > On Thu, Dec 5, 2019 at 11:08 AM Dian Fu wrote: > > > +1 for dropping them. > > > > Just FYI: there was a similar discussion few months ago [1]. > > > > [1] > > > http://apache-fli

Re: What S3 Permissions does StreamingFileSink need?

2019-12-06 Thread Khachatryan Roman
Hey Li, > my permissions is as listed above As I understand it, it's a terraform script above. But what are the actual permissions in AWS? And it also makes sense to make sure that they are associated with the right role and role with user. > Maybe I need to add the directory level as a resource?

KeyBy/Rebalance overhead?

2019-12-06 Thread Komal Mariam
Hello everyone, I want to get some insights on the KeyBy (and Rebalance) operations as according to my understanding they partition our tasks over the defined parallelism and thus should make our pipeline faster. I am reading a topic which contains 170,000,000 pre-stored records with 11 Kafka par

[DISCUSS] Adding e2e tests for Flink's Mesos integration

2019-12-06 Thread Yangze Guo
Hi, all, Currently, there is no end to end test or IT case for Mesos deployment while the common deployment related developing would inevitably touch the logic of this component. Thus, some work needs to be done to guarantee experience for both Meos users and contributors. After offline discussion

Re: How to explain the latency at different source injection rate?

2019-12-06 Thread Till Rohrmann
Hi Rui, it is hard to explain the results you are observing without knowing the complete benchmark setup. For example, it would be interesting to know the exact details of your job and the way you are measuring latencies. W/o this information I would only be able to guess things. Cheers, Till On

Re: User program failures cause JobManager to be shutdown

2019-12-06 Thread Khachatryan Roman
Hi Dongwon, This should work but it could also interfere with Flink itself exiting in case of a fatal error. Regards, Roman On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim wrote: > FYI, we've launched a session cluster where multiple jobs are managed by a > job manager. If that happens, all the ot