Re: PyFlink performance and deployment issues

2021-07-07 Thread Dian Fu
Hi Wouter, 1) Regarding the performance difference between Beam and PyFlink, I guess it’s because you are using an in-memory runner when running it locally in Beam. In that case, the code path is totally differently compared to running in a remote cluster. 2) Regarding to `flink run`, I’m surpr

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Rakshit Ramesh
Yes! I was only worried about the jobid changing and the checkpoint being un-referenceable. But since I can pass a path to the checkpoint that will not be an issue. Thanks a lot for your suggestions! On Thu, 8 Jul 2021 at 11:26, Arvid Heise wrote: > Hi Rakshit, > > It sounds to me as if you do

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Arvid Heise
Hi Rakshit, It sounds to me as if you don't need the Savepoint API at all. You can (re)start all applications with the previous state (be it retained checkpoint or savepoint). You just need to provide the path to that in your application invocation [1] (every entry point has such a parameter, you

Re: PyFlink performance and deployment issues

2021-07-07 Thread Xingbo Huang
Hi Wouter, Sorry for the late reply. I will try to answer your questions in detail. 1. >>> Perforce problem. When running udf job locally, beam will use a loopback way to connect back to the python process used by the compilation job, so the time of starting up the job will come faster than pyflin

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Rakshit Ramesh
Sorry for being a little vague there. I want to create a Savepoint from a DataStream right before the job is finished or cancelled. What you have shown in the IT case is how a datastream can be bootstrapped with state that is formed formed by means of DataSet. My jobs are triggered by a scheduler p

More detail information in sql validate exception

2021-07-07 Thread 纳兰清风
Hi, Currently, When I was using a lot of the same udf in a sql, I can't locate where the semantic occor if some udf being used in a wrong way. So I try to change some code in flink-table-common and flink-table-runtime-blink to extract more detail information such as position and sql context in t

Re: Apache Flink - How to use/invoke LookupTableSource/Function

2021-07-07 Thread JING ZHANG
Hi Mans, *Before coming to the next part, we may need some backgrounds about lookup join and temporal join.* 1. Lookup join is typically used to enrich a table with data that is queried from an external system. It requires right table to be backed by a lookup source connector. Its syntax is same wi

Re: State Processor API and existing state

2021-07-07 Thread JING ZHANG
Hi min, Is the POJO state in an existed operator or a new added operator? BTW, that would be great if you would like to give the code to reproduce the exception. I need more debug to find out the reason based on the code. Tan, Min 于2021年7月8日周四 上午2:56写道: > Hi, > > > > I have followed the steps b

Re: OOM Metaspace after multiple jobs

2021-07-07 Thread Alexis Sarda-Espinosa
I now see there have been problems with this in the past: https://issues.apache.org/jira/browse/FLINK-16142 https://issues.apache.org/jira/browse/FLINK-19005 I actually use both JDBC and gRPC, so it seems this could indeed be an issue for me. Does anyone know if I can ensure my classes get clean

OOM Metaspace after multiple jobs

2021-07-07 Thread Alexis Sarda-Espinosa
Hello, I am currently testing a scenario where I would run the same job multiple times in a loop with different inputs each time. I am testing with a local Flink cluster v1.12.4. I initially got an OOM - Metaspace error, so I increased the corresponding memory in the TM's JVM (to 512m), but it

Flink Metric Reporting from Job Manager

2021-07-07 Thread Mason Chen
Hi all, Does Flink support reporting metrics from the main method that is ran on the Job Manager? In this case, we want to report a failure to add an operator to the Job Graph. Best, Mason

Question about POJO rules - why fields should be public or have public setter/getter?

2021-07-07 Thread Naehee Kim
According to the Flink doc, Flink recognizes a data type as a POJO type (and allows “by-name” field referencing) if the following conditions are fulfilled: - The class is public and standalone (no non-static inner class) - The class has a public no-argument constructor - All non-static,

Re: The best way to read historical data in a stream

2021-07-07 Thread Lasse Nedergaard
Hi Arvid Thanks for your input. My first implemention was iterations but I had a challenges to match up the returned rows with the original input so the current implementation use Async IO where I attach the found rows with the input. It makes it easier downstream. I just have to test if I can

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Arvid Heise
I don't quite understand your question. You use Savepoint API to create a savepoint with a batch job (that's why it's DataSet Transform currently). That savepoint can only be restored through a datastream application. Dataset applications cannot start from a savepoint. So I don't understand why yo

Re: Understanding recovering from savepoint / checkpoint with additional operators when chaining

2021-07-07 Thread Arvid Heise
Hi Jiahui, changing the job graph is what I meant with application upgrade. There is no difference between checkpoint and savepoint afaik. New operators would be initialized with empty state - so correct for stateless operators. So it should work for all sketched cases with both savepoints and ret

RE: State Processor API and existing state

2021-07-07 Thread Tan, Min
Hi, I have followed the steps below in restarting a Flink job with newly modified savepoints. I can re start a job with new savepoints as long as the Flink states are expressed in Java primitives. When the flink states are expressed in a POJO, my job does not get restarted. I have the followin

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Rakshit Ramesh
Yes I could understand restoring a savepoint to a datastream. What I couldn't figure out is to create a NewSavepoint for a datastream. What I understand is that NewSavepoints only take in Bootstrap transformation for Dataset Transform functions. About the checkpoints, does CheckpointConfig.Exter

Re: Flink CEP checkpoint size

2021-07-07 Thread Aeden Jameson
We did look into fixing it ourselves, but decided that migrating to the datastream api, not using CEP, was more fruitful overall for us. Unfortunately, I don't have a good answer for you. The bug from a non-contributors stand point appears pretty deep in the codebase, but the authors are best ones

Re: How Do I Specify the Encryption Algorithm Suite of the Flink REST Service?

2021-07-07 Thread Nicolaus Weidner
Hi Wanghui, quoting your reply here since it went only to me instead of the mailing list as well: Hi Nico: > > Thank you for your reply. > > I configured security.ssl.algorithms in flink-conf.yaml, but it > seems to work only for SSL connections to internal services. > >

Re: Issue while using parallelism.default in flink-conf.yaml file

2021-07-07 Thread Nicolaus Weidner
Hi Mahima, looks like you found the relevant parts of the code already: In JarHandlerUtils.JarHandlerContext#fromRequest, the parallelism value is extracted from the request body or query parameter (the latter is deprecated, though). If none is found, it defaults to 1 and overwrites the configured

Re: Understanding recovering from savepoint / checkpoint with additional operators when chaining

2021-07-07 Thread Jiahui Jiang
Hello Arvid, how about no upgrade, just changing the job graph by having different stateless operators? Will checkpoint be sufficient? The specific example use case is - we have some infrastructure that orchestrates and runs user SQL queries. Sometimes in between runs users might have changed t

RE: Flink Issues facing while connecting to Azure Blob Storage

2021-07-07 Thread Singh Simarpreet (SX/EAA2)
Hi Arvid, Yes, I did the same way. But it’s not working. Can you suggest anything, which I can try? Mit freundlichen Grüßen / Best regards Simarpreet Singh AA ESIRTS, Leo, OES ESI-Updates (RBEI/EAA2) Robert Bosch GmbH | Postfach 10 60 50 | 70049 Stuttgart | GERMANY | www.bosch.com simarpreet.

Re:

2021-07-07 Thread Maciej Bryński
Hi Nicolaus, I'm sending records as an attachment. Regards, Maciek śr., 7 lip 2021 o 11:47 Nicolaus Weidner napisał(a): > > Hi Maciek, > > is there a typo in the input data? Timestamp 2021-05-01 04:42:57 appears > twice, but timestamp 2021-05-01T15:28:34 (from the log lines) is not there at >

Re:

2021-07-07 Thread Nicolaus Weidner
Hi Maciek, is there a typo in the input data? Timestamp 2021-05-01 04:42:57 appears twice, but timestamp 2021-05-01T15:28:34 (from the log lines) is not there at all. I find it hard to correlate the logs with the input... Best regards, Nico On Wed, Jul 7, 2021 at 11:16 AM Arvid Heise wrote: >

Re: Flink 1.13 fails to load log4j2 yaml configuration file via jackson-dataformat-yaml

2021-07-07 Thread Yuval Itzchakov
Interesting, I don't have bind explicitly on the classpath, will give it a try. Although locally it is working properly. On Wed, Jul 7, 2021, 12:19 Chesnay Schepler wrote: > According to the log4j documentation you need both jackson-databind and > jackson-dataformat-yaml to be on the classpath.

Re: Apache Flink - How to use/invoke LookupTableSource/Function

2021-07-07 Thread M Singh
Hi Jing: Thanks for your explanation and references.  I looked at your reference (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/joins/#lookup-join) and have a few question regarding the example code: CREATE TEMPORARY TABLE Customers ( id INT, name

Re: Understanding recovering from savepoint / checkpoint with additional operators when chaining

2021-07-07 Thread Arvid Heise
Hi Jiahui, Savepoint format is more universal and should be used when upgrading Flink versions. If you just upgrade the applications, there shouldn't be a big difference afaik. On Tue, Jul 6, 2021 at 9:41 PM Jiahui Jiang wrote: > Hello Roman, > > Sorry I did some more testing and the original f

Re: Flink 1.13 fails to load log4j2 yaml configuration file via jackson-dataformat-yaml

2021-07-07 Thread Chesnay Schepler
According to the log4j documentation you need both jackson-databind and jackson-dataformat-yaml to be on the classpath. Did you make sure that that this is the case in 1.13.1? It could very well be that in 1.9 databind was on the classpath by chance. On 07/07/2021 10:22, Arvid Heise wrote: Hi

Re: Bug in MATCH_RECOGNIZE ?

2021-07-07 Thread Arvid Heise
Note to other ML users: this seems to be a duplicate, let's respond only to the earlier message. On Tue, Jul 6, 2021 at 3:28 PM Maciej Bryński wrote: > Hi, > I have a very strange bug when using MATCH_RECOGNIZE. > > I'm using some joins and unions to create an event stream. Sample > event stream

Re:

2021-07-07 Thread Arvid Heise
Hi Maciek, could you bypass the MATCH_RECOGNIZE (=comment out) and check if the records appear in a shortcutted output? I suspect that they may be filtered out before (for example because of number conversion issues with 0E-18) On Tue, Jul 6, 2021 at 3:26 PM Maciek Bryński wrote: > Hi, > I hav

Re: Flink + Kafka Dynamic Table

2021-07-07 Thread Arvid Heise
Hi, you want to use Table API with dynamic tables [1] and upsert Kafka [2]. This will create an update message in your log-compacted kafka topic for each changed result, such that this can be used as a key-value store. In Kafka, the updated record would be appended and the old record would eventua

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Arvid Heise
Hi Rakshit, The example is valid. The state processor API is kinda working like a DataSet application but the state is meant to be read in DataStream. Please check out the SavepointWriterITCase [1] for a full example. There is no checkpoint/savepoint in DataSet applications. Checkpoints can be st

Re: Flink 1.13 fails to load log4j2 yaml configuration file via jackson-dataformat-yaml

2021-07-07 Thread Arvid Heise
This seems to be a duplicate of an earlier thread. Please only respond there. On Mon, Jul 5, 2021 at 4:05 PM Yuval Itzchakov wrote: > Hi, > > I am attempting to upgrade Flink from 1.9 to 1.13.1 > I am using a YAML based log4j file. In 1.9, it worked perfectly fine by > adding the following depen

Re: The best way to read historical data in a stream

2021-07-07 Thread Arvid Heise
Hi Lasse, That's a tough question. The real Kappa way would be to load the full database as a 2. input into the job and use joins. But I'm assuming that you can't or don't want to do that. 1. Can work if you use a windowing operator before and only trigger one or few async IO calls per window bat

Re: Unsubscribe

2021-07-07 Thread Arvid Heise
To unsubscribe, please use the link from [1]. [1] https://flink.apache.org/community.html On Mon, Jul 5, 2021 at 1:31 PM Gg Kudelska wrote: > Unsubscibe > > pon., 5 lip 2021, 08:19 użytkownik Dan Pettersson < > dan.pettersso...@gmail.com> napisał: > >> Unsubscribe >> >

Re: Flink 1.13 fails to load log4j2 yaml configuration file via jackson-dataformat-yaml

2021-07-07 Thread Arvid Heise
Hi Yuval, For some reason the YamlConfigurationFactory is not correctly loaded and the fallback XmlConfigurationFactory is used unsuccessfully. You could try to force DEBUG on that YamlConfigurationFactory and check for some output like "Missing dependencies for Yaml support, ConfigurationFactory

RE: Using Flink's Kubernetes API inside Java

2021-07-07 Thread Alexis Sarda-Espinosa
Thanks Roman and Yang, I understand. I’ll have a look and ask on the developer list depending on what I find. Regards, Alexis. From: Yang Wang Sent: Mittwoch, 7. Juli 2021 05:14 To: ro...@apache.org Cc: Alexis Sarda-Espinosa ; user@flink.apache.org Subject: Re: Using Flink's Kubernetes API ins

Re: Flink Issues facing while connecting to Azure Blob Storage

2021-07-07 Thread Arvid Heise
Hi Simar, just to be sure, did you create a subfolder in the plugins folder where you put the jar as described here [1]? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/plugins/ On Sun, Jul 4, 2021 at 8:10 PM Singh Simarpreet (SX/EAA2) < simarpreet.si

Re: OutOfMemory Failure on Savepoint

2021-07-07 Thread Arvid Heise
Hi Abhishek, Does your job use checkpointing? It seems like it's the first time the respective checkpoint/savepoint thread pool is touched and at this point, there are not enough handles. Do you have a way to inspect the ulimits on the task managers? If you don't have a way to change the limits,