Re: Native kubernetes execution and History server

2021-03-25 Thread Guowei Ma
Hi, After some discussion with Wang Yang offline, it seems that there might be a jobmanager failover. So would you like to share full jobmanager log? Best, Guowei On Wed, Mar 24, 2021 at 10:04 PM Lukáš Drbal wrote: > Hi, > > I would like to use native kubernetes execution [1] for one batch job

Re: flink sql jmh failure

2021-03-25 Thread Guowei Ma
Hi, I am not an expert of JMH but it seems that it is not an error. From the log it looks like that the job is not finished. The data source continues to read data when JMH finishes. Thread[Legacy Source Thread - Source: TableSourceScan(table=[[default_catalog, default_database, CLICKHOUSE_SOURCE_

Re: Native kubernetes execution and History server

2021-03-25 Thread Lukáš Drbal
Hello, sure. Here is log from first run which succeed - https://pastebin.com/tV75ZS5S and here is from second run (it's same for all next) - https://pastebin.com/pwTFyGvE My Docker file is pretty simple, just take wordcount + S3 FROM flink:1.12.2 RUN mkdir -p $FLINK_HOME/usrlib COPY flink-examp

Re: Native kubernetes execution and History server

2021-03-25 Thread Guowei Ma
Hi, Thanks for providing the logs. From the logs this is a known bug.[1] Maybe you could use `$internal.pipeline.job-id` to set your own job-id.(Thanks to Wang Yang) But keep in mind this is only for internal use and may be changed in some release. So you should keep an eye on [1] for the correct s

Hadoop is not in the classpath/dependencies

2021-03-25 Thread Matthias Seiler
Hello everybody, I set up a a Flink (1.12.1) and Hadoop (3.2.1) cluster on two machines. The job should store the checkpoints on HDFS like so: ```java StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(15000, CheckpointingMode.EXACTLY_ONC

Re: Native kubernetes execution and History server

2021-03-25 Thread Lukáš Drbal
Hello Guowei, I just checked it and it works! Thanks a lot! Here is workaround which use UUID as jobId: -D\$internal.pipeline.job-id=$(cat /proc/sys/kernel/random/uuid|tr -d "-") L. On Thu, Mar 25, 2021 at 11:01 AM Guowei Ma wrote: > Hi, > Thanks for providing the logs. From the logs this i

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-03-25 Thread Matthias Pohl
Hi everyone, considering the upcoming release of Flink 1.13, I wanted to revive the discussion about the Mesos support ones more. Mesos is also already listed as deprecated in Flink's overall roadmap [1]. Maybe, it's time to align the documentation accordingly to make it more explicit? What do you

Glob support on file access

2021-03-25 Thread Etienne Chauchot
Hi all, In case it is useful to some of you: I have a big batch that needs to use globs (*.parquet for example) to read input files. It seems that globs do not work out of the box (see https://issues.apache.org/jira/browse/FLINK-6417) But there is a workaround: final FileInputFormat input

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-03-25 Thread Konstantin Knauf
Hi Matthias, Thank you for following up on this. +1 to officially deprecate Mesos in the code and documentation, too. It will be confusing for users if this diverges from the roadmap. Cheers, Konstantin On Thu, Mar 25, 2021 at 12:23 PM Matthias Pohl wrote: > Hi everyone, > considering the upc

Re: flink sql jmh failure

2021-03-25 Thread jie mei
HI, Guowei yeah, I think so too. There is no way trigger a checkpoint and wath the checkpoint finished now, so I will do the benchmark with lower level api. Guowei Ma 于2021年3月25日周四 下午4:59写道: > Hi, > I am not an expert of JMH but it seems that it is not an error. From the > log it looks like th

[HEADS UP] Flink Community Survey closes Tue, March 30

2021-03-25 Thread Ana Vasiliuk
Hi all, Thanks to everyone who has already left feedback on the community experience in the Community Survey! The survey is open until *Tuesday, March 30th*, so if you haven't done so yet, please take 2 minutes (maybe less!) to fill it out below. Your opinion is very helpful for us to better unde

reading from jdbc connection

2021-03-25 Thread Arran Duff
Hi, I'm quite new to flink and I'm trying to create an application, which reads ID's from a kinesis stream and then uses these to read from a mysql database. I expect that I would just be doing a join of the id's onto the table I'm struggling to understand from the documentation how to actually

Re: Flink job repeated restart failure

2021-03-25 Thread Arvid Heise
Hi Vinaya, SpillingAdaptiveSpanningRecordDeserializer tries to create a directory in the temp directory, which you can configure by setting io.tmp.dirs. By default, it's set to System.getProperty("java.io.tmpdir"), which seems to be invalid in your case. (Note that the directory has to exist on th

Re: Hadoop is not in the classpath/dependencies

2021-03-25 Thread Maminspapin
I have the same problem ... -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Hadoop is not in the classpath/dependencies

2021-03-25 Thread Maminspapin
I downloaded the lib (last version) from here: https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-7.0/ and put it in the flink_home/lib directory. It helped. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Question about checkpoints and savepoints

2021-03-25 Thread Robert Cullen
When I run a job on my Kubernetes session cluster only the checkpoint directories are created but not the savepoints. (Filesystem configured to S3 Minio) Any ideas? -- Robert Cullen 240-475-4490

Re: Flink on Minikube

2021-03-25 Thread Sandeep khanzode
Hi Arvid, Thanks, will set the scope to Provided and try. Are there public examples in GitHub that demonstrate a sample app in Minikube? Sandeep > On 23-Mar-2021, at 3:17 PM, Arvid Heise wrote: > > Hi Sandeep, > > please have a look at [1], you should add most Flink dependencies as provide

FlinkKafkaConsumer - Broadcast - Initial Load

2021-03-25 Thread Sandeep khanzode
Hi, I have a master/reference data that needs to come in through a FlinkKafkaConsumer to be broadcast to all nodes and subsequently joined with the actual stream for enriching content. The Kafka consumer gets CDC-type records from database changes. All this works well. My question is how do

General guidance

2021-03-25 Thread Almeida, Julius
Hi Team, My streaming pipeline is based on beam & running using flink runner with rocksdb as state backend. Over time I am seeing memory spike & after giving a look at heap dump, I am seeing records in ‘__StatefulParDoGcTimerId’ which seems to be never cleaned. Found this jira https://issues

Re: General guidance

2021-03-25 Thread Kenneth Knowles
This is a Beam issue indeed, though it is an issue with the FlinkRunner. So I think I will BCC the Flink list. You may be in one of the following situations: - These timers should not be viewed as distinct by the runner, but deduped, per https://issues.apache.org/jira/browse/BEAM-8212#comment-169

Re: Time Temporal Join

2021-03-25 Thread Satyam Shekhar
Hi Timo, Apologies for the late response. I somehow seem to have missed your reply. I do want the join to be "time-based" since I need to perform a tumble grouping operation on top of the join. I tried setting the watermark strategy to `R` - INTERVAL '0.001' SECONDS, that didn't help either. No

Re: The Role of TimerService in ProcessFunction

2021-03-25 Thread Chirag Dewan
Thanks for the clarification Dawid. Resolves my confusion. Sent from Yahoo Mail on Android On Fri, 19 Mar 2021 at 2:41 pm, Dawid Wysakowicz wrote: Hi Chirag, I agree it might be a little bit confusing. Let me try to explain the reasoning. To do that I'll first try to rephrase the rea

Re: [EXTERNAL] Re: PyFlink DataStream Example Kafka/Kinesis?

2021-03-25 Thread Shuiqiang Chen
Hi Kevin, Xinbin, Hi Shuiqiang, > > Thanks for the quick response on creating the ticket for Kinesis > Connector. Do you mind giving me the chance to try to implement the > connector over the weekend? > > I am interested in contributing to Flink, and I think this can be a good > starting point to

Re: Native kubernetes execution and History server

2021-03-25 Thread Yang Wang
Thanks Guowei for the comments and Lukáš Drbal for sharing the feedback. I think it is not only for Kubernetes application mode, but also Yarn application and standalone application, the job id will be set to ZERO if not configured explicitly in HA mode. For standalone application, we could use "

Re: Flink job repeated restart failure

2021-03-25 Thread vinaya
Hi Arvid, Thank you for the suggestion. Indeed, the specified setting was commented out in the Flink configuration (flink-conf.yaml). # io.tmp.dirs: /tmp Is there a fallback (e.g. /tmp) if io.tmp.dirs and System.getProperty("java.io.tmpdir") are both not set? Will configure this setting to a

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-25 Thread Jark Wu
IIUC, pipeline.auto-watermak-interval = 0 just disable **periodic** watermark emission, it doesn't mean the watermark will never be emitted. In Table API/SQL, it has the same meaning. If watermark interval = 0, we disable periodic watermark emission, and emit watermark once it advances. So I thin