Re: Flink 1.8.3 GC issues

2020-09-11 Thread Piotr Nowojski
Hi Josson, Have you checked the logs as Nico suggested? At 18:55 there is a dip in non-heap memory, just about when the problems started happening. Maybe you could post the TM logs? Have you tried updating JVM to a newer version? Also it looks like the heap size is the same between 1.4 and 1.8, bu

How to schedule Flink Batch Job periodically or daily

2020-09-11 Thread s_penakalap...@yahoo.com
Hi Team, We have Flink Batch Jobs which needs to be scheduled as listed below:Case 1 :    2.00 UTC time  dailyCase 2 :    Periodically 2 hours onceCase 3:     Schedule based on an event Request you to help me on this,  How to approach all the 3 use cases. Can we use Oozie workflows or any better

Re: [DISCUSS] Drop Scala 2.11

2020-09-11 Thread Igal Shilman
@Galen FYI: the upcoming StateFun release would use Scala2.12 On Thu, Sep 10, 2020 at 5:14 PM Seth Wiesman wrote: > @glen > > Yes, we would absolutely migrate statefun. StateFun can be compiled with > Scala 2.12 today, I'm not sure why it's not cross released. > > @aljoscha :) > > @mathieu Its

Re: How to schedule Flink Batch Job periodically or daily

2020-09-11 Thread Robert Metzger
Hi Sunitha, (Note: You've emailed both the dev@ and user@ mailing list. Please only use the user@ mailing list for questions on how to use Flink. I'm moving the dev@ list to bcc) Flink does not have facilities for scheduling batch jobs, and there are no plans to add such a feature (this is not in

Re: Idle stream does not advance watermark in connected stream

2020-09-11 Thread Robert Metzger
Hi Pierre, It seems that the community is working on providing a fix with the next 1.11 bugfix release (and for 1.12). You can follow the status of the ticket here: https://issues.apache.org/jira/browse/FLINK-18934 Best, Robert On Thu, Sep 10, 2020 at 11:00 AM Pierre Bedoucha wrote: > Hi and

Re: Speeding up CoGroup in batch job

2020-09-11 Thread Robert Metzger
Hi Ken, Some random ideas that pop up in my head: - make sure you use data types that are efficient to serialize, and cheap to compare (ideally use primitive types in TupleN or POJOs) - Maybe try the TableAPI batch support (if you have time to experiment). - optimize memory usage on the TaskManage

Re: Streaming data to parquet

2020-09-11 Thread Robert Metzger
Hi Marek, what you are describing is a known problem in Flink. There are some thoughts on how to address this in https://issues.apache.org/jira/browse/FLINK-11499 and https://issues.apache.org/jira/browse/FLINK-17505 Maybe some ideas there help you already for your current problem (use long checkp

Re: Measure CPU utilization

2020-09-11 Thread Robert Metzger
Hi Piper, I personally like looking at the system load (if Flink is the only major process on the system). It nicely captures the "stress" Flink puts on the system (this would be the "System.CPU.Load5min class of metrics") (there are a lot of articles about understanding linux load averages) I do

Re: How to access state in TimestampAssigner in Flink 1.11?

2020-09-11 Thread Aljoscha Krettek
Hi Theo, I think you're right that there is currently no good built-in solution for your use case. What you would ideally like to have is some operation that can buffer records and "hold back" the watermark according to the timestamps of the records that are in the buffer. This has the benefit

Re: Streaming data to parquet

2020-09-11 Thread Ayush Verma
Hi, Looking at the problem broadly, file size is directly tied up with how often you commit. No matter which system you use, this variable will always be there. If you commit frequently, you will be close to realtime, but you will have numerous small files. If you commit after long intervals, you

Re: arbitrary state handling in python api

2020-09-11 Thread Georg Heiler
Many thanks. This is great to hear. Yes! This looks great. Many Thanks! Best, Georg Am Do., 10. Sept. 2020 um 23:53 Uhr schrieb Dian Fu : > Hi Georg, > > It still doesn't support state access in Python API in the latest version > 1.11. > > Could you take a look at if KeyedProcessFunction could

Re: Streaming data to parquet

2020-09-11 Thread Senthil Kumar
Hello Ayush, I am interesting in knowing about your “really simple” implementation. So assuming the streaming parquet output goes to S3 bucket: Initial (partitioned by event time) Do you write another Flink batch application (step 2) which partitions the data from Initial in larger “event time

Flink UI not displaying records received/sent metrics

2020-09-11 Thread Prashant Nayak
We are running a Flink 1.11.0 job cluster on Kubernetes. We're not seeing any metrics in the Flink Web UI (for the default metrics like Bytes Received, Records Received, etc.), instead we see a spinner. See image below. However, we have a prometheus metrics exporter configured and see job/task m

Re: Flink UI not displaying records received/sent metrics

2020-09-11 Thread Robert Metzger
Hi Prashant, My initial suspicion is that this is a problem in the UI or with the network connection from the browser to the Flink REST endpoints. Since you can access the metrics with "curl", Flink seems to do everything all right. The first URL you posted is for the watermarks (it ends with "/

Re: Struggling with reading the file from s3 as Source

2020-09-11 Thread Robert Metzger
Hi Vijay, Can you post the error you are referring to? Did you properly set up an s3 plugin ( https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/) ? On Fri, Sep 11, 2020 at 8:42 AM Vijay Balakrishnan wrote: > Hi, > > I want to *get data from S3 and process and send to Kinesi

Re: I hit a bad jobmanager address when trying to use Flink SQL Client

2020-09-11 Thread Robert Metzger
Hi Dan, the notation of "flink-jobmanager/10.98.253.58:8081" is not a problem. It is how java.net.InetAddress stringifies a resolved address (with both hostname and IP). How did you configure the SQL client to work with a Kubernetes Session? Afaik this is not a documented, tested and officially s

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-09-11 Thread Robert Metzger
Hi Averell, as far as I know these tmp files should be removed when the Flink job is recovering. So you should have these files around only for the latest incomplete checkpoint while recovery has not completed yet. On Tue, Sep 1, 2020 at 2:56 AM Averell wrote: > Hello Robert, Arvid, > > As I am

RestClusterClient locks file after calling `submitJob(JobGraph)` method on Windows OS

2020-09-11 Thread Vladislav Keda
Hi Flink Community, I was trying to submit a flink job on a standalone cluster using RestClusterClient. After waiting for job submission, I got JobID correctly and tried to delete the source jar file. But then I got the exception: java.nio.file.FileSystemException: /path/to/jar: Процесс не может

Re: I hit a bad jobmanager address when trying to use Flink SQL Client

2020-09-11 Thread Dan Hill
Hi Robert! I have Flink running locally on minikube. I'm running SQL client using exec on the jobmanager. kubectl exec pod/flink-jobmanager-0 -i -t -- /opt/flink/bin/sql-client.sh embedded -e /opt/flink/sql-client-defaults.yaml Here's the sql-client-defaults.yaml. I didn't specify a session. e

Flink Stateful Functions API

2020-09-11 Thread Timothy Bess
The flink stateful function Python API looks cool, but is there a documented spec for how it communicates with Flink? I'd like to implement an SDK in Haskell if I can.