Re: Back pressure with multiple joins

2020-09-22 Thread Dan Hill
When I use DataStream and implement the join myself, I can get 50x the throughput. I assume I'm doing something wrong with Flink's Table API and SQL interface. On Tue, Sep 22, 2020 at 11:21 PM Dan Hill wrote: > Hi! > > My goal is to better understand how my code impacts streaming throughput. >

Back pressure with multiple joins

2020-09-22 Thread Dan Hill
Hi! My goal is to better understand how my code impacts streaming throughput. I have a streaming job where I join multiple tables (A, B, C, D) using interval joins. Case 1) If I have 3 joins in the same query, I don't hit back pressure. SELECT ... FROM A LEFT JOIN B ON... LEFT JOIN C ON... LEFT

Re: Debugging "Container is running beyond physical memory limits" on YARN for a long running streaming job

2020-09-22 Thread Xintong Song
Hi Shubham, Concerning FLINK-18712, thanks for the pointer. I was not aware of this issue before. Running on Kubernetes or Yarn should not affect this issue. I cannot tell whether this issue is the cause of your problem. The simplest way to confirm this is probably just try the solution to see if

Flink Statefun Byte Ingress

2020-09-22 Thread Timothy Bess
Hi, So most of the examples of "module.yaml" files I've seen focus on protobuf ingress, but is there a way to just get bytes from Kafka? I want to integrate this with the rest of my codebase which uses JSON, but don't want to migrate to protobuf just yet. I'm not totally sure how it would work sin

Re: Flink Table SQL and writing nested Avro files

2020-09-22 Thread Dan Hill
Nice! I'll try that. Thanks, Dawid! On Mon, Sep 21, 2020 at 2:37 AM Dawid Wysakowicz wrote: > Hi Dan, > > I think the best what I can suggest is this: > > SELECT > > ROW(left.field0, left.field1, left.field2, ...), > > ROW(right.field0, right.field1, right.field2, ...) > > FROM ... > >

Re: How to stop multiple Flink jobs of the same name from being created?

2020-09-22 Thread Dan Hill
Hi Yang! The multiple "INSERT INTO" jobs all go to the same Flink cluster. I'm using this Helm chart (which looks like the standalone option). I deploy the job using a simple k8 Job. Sounds like I should do this myself. Thanks

Stateful Functions + ML model prediction

2020-09-22 Thread John Morrow
Hi Flink Users, I'm using Flink to process a stream of records containing a text field. The records are sourced from a message queue, enriched as they flow through the pipeline based on business rules and finally written to a database. We're using the Ververica platform so it's running on Kuber

Re: Better way to share large data across task managers

2020-09-22 Thread Kostas Kloudas
Hi Dongwon, If you know the data in advance, you can always use the Yarn options in [1] (e.g. the "yarn.ship-directories") to ship the directories with the data you want only once to each Yarn container (i.e. TM) and then write a udf which reads them in the open() method. This will allow the data

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-22 Thread Claude M
Thanks for your responses. 1. There were no job re-starts prior to the metaspace OEM. 2. I tried increasing the CPU request and still encountered the problem. Any configuration change I make to the job manager, whether it's in the flink-conf.yaml or increasing the pod's CPU/memory request, result

RichFunctions in Flink's Table / SQL API

2020-09-22 Thread Piyush Narang
Hi folks, We were looking to cache some data using Flink’s MapState in one of our UDFs that are called by Flink SQL queries. I was trying to see if there’s a way to set up these state objects via the basic FunctionContext [1] we’re provided in the Table / SQL UserDefinedFunction class [2] but f

Adaptive load balancing

2020-09-22 Thread Navneeth Krishnan
Hi All, We are currently using flink in production and use keyBy for performing a CPU intensive computation. There is a cache lookup for a set of keys and since keyBy cannot guarantee the data is sent to a single node we are basically replicating the cache on all nodes. This is causing more memory

Re: Debugging "Container is running beyond physical memory limits" on YARN for a long running streaming job

2020-09-22 Thread Shubham Kumar
Hi Xintong, Thanks for your insights, they are really helpful. I understand now that it most certainly is a native memory issue rather than a heap memory issue and about not trusting Flink's Non-Heap metrics. I do believe that our structure of job is so simple that I couldn't find any use of mma

Re: How to stop multiple Flink jobs of the same name from being created?

2020-09-22 Thread Yang Wang
Hi Dan, First, I want to get more information about your submission so that we could make the question clear. Are you using TableEnvironment to execute multiple "INSERT INTO" sentences and find that each one will be executed in a separated Flink cluster? It is really strange, and I want to know h

Re: App gets stuck in Created State

2020-09-22 Thread Arpith P
All the job manager logs have been deleted from the cluster. I'll have to work with the infra team to get it back, once I have it i'll post it here. Arpith On Mon, Sep 21, 2020 at 5:50 PM Zhu Zhu wrote: > Hi Arpith, > > All tasks in CREATED state indicates no task is scheduled yet. It is > stra

Re: Zookeeper connection loss causing checkpoint corruption

2020-09-22 Thread Arpith P
I created a ticket with all my findings. https://issues.apache.org/jira/browse/FLINK-19359. Thanks, Arpith On Tue, Sep 22, 2020 at 12:16 PM Timo Walther wrote: > Hi Arpith, > > is there a JIRA ticket for this issue already? If not, it would be great > if you can report it. This sounds like a cr

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-22 Thread Rui Li
Hi Timo, I believe the blocker for this feature is that we don't support dynamically adding user jars/resources at the moment. We're able to read the path to the function jar from Hive metastore, but we cannot load the jar after the user session is started. On Tue, Sep 22, 2020 at 3:43 PM Timo Wa

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-22 Thread Husky Zeng
Hi Timo, Thanks for your attention,As what I say in this comment, this feature can surely solve our problem, but it seems that the workload is much larger than the solution in my scenario. Our project urgently needs to solve the problem of reusing hive UDF in hive metastore, so we are more incline

Re: Does Flink support such a feature currently?

2020-09-22 Thread Marta Paes Moreira
Hi, Roc. *Note:* in the future, please send this type of questions to the user mailing list instead (user@flink.apache.org)! If I understand your question correctly, this is possible using the LIKE clause and a registered catalog. There is currently no implementation for the MySQL JDBC catalog, b

Re: Flink DynamoDB stream connector losing records

2020-09-22 Thread Jiawei Wu
Hi Ying and Danny, Sorry for the late reply, I just got back from vacation. Yes I'm running Flink in Kinesis Data Analytics with Flink 1.8, and checkpoint is enabled. This fully managed solution limits my access to Flink logs, so far I didn't get any logs related to throttle or fail over. The rea

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-22 Thread Timo Walther
Hi Husky, I guess https://issues.apache.org/jira/browse/FLINK-14055 is what is needed to make this feature possible. @Rui: Do you know more about this issue and current limitations. Regards, Timo On 18.09.20 09:11, Husky Zeng wrote: When we submit a job which use udf of hive , the job will

[ANNOUNCE] Weekly Community Update 2020/38

2020-09-22 Thread Konstantin Knauf
Dear community, happy to share a brief community update for the past week. A lot of FLIP votes are currently ongoing on the dev@ mailing list. I've covered this FLIP previously, so skipping those this time. Besides that, a couple of release related updates and again multiple new Committers. Flink

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-22 Thread Dawid Wysakowicz
Hi Lian, Thank you for sending the full code for the pojo. It clarified a lot! I learnt that Avro introduced yet another mechanism for retrieving conversions for logical types in Avro 1.9.x. I was not aware they create a static SpecificData field with registered logical conversions if a logical t

Re: hourly counter

2020-09-22 Thread Timo Walther
Hi Lian, you are right that timers are not available in a ProcessWindowFunction but the state store can be accessed. So given that your window width is 1 min, you could maintain an additional state value for counting the minutes and updating your counter once this value reached 60. Otherwise