KeyedStream and keyedProcessFunction

2020-06-09 Thread Jaswin Shah
Hi All, I have a keyed data stream and calling a keyedProcessFunction after keyBy operation on datastream. Till now my understanding was, "For all different n- elements in keyed stream if their keys are same, same instance of keyedProcessFunction is called and for another elements with differen

Re: Flink on yarn : yarn-session understanding

2020-06-09 Thread Andrey Zagrebin
Hi Anuj, Afaik, the REST API should work for both modes. What is the issue? Maybe, some network problem to connect to YARN application master? Best, Andrey On Mon, Jun 8, 2020 at 4:39 PM aj wrote: > I am running some stream jobs that are long-running always. I am currently > submitting each jo

Re: [External Sender] Re: Avro Arrat type validation error

2020-06-09 Thread Dawid Wysakowicz
To make sure we are on the same page. The end goal is to have the CatalogTable#getTableSchema/TableSource#getTableSchema return a schema that is compatible with TableSource#getProducedDataType. If you want to use the new types, you should not implement the TableSource#getReturnType. Moreover you

Re: [External Sender] Re: Flink sql nested elements

2020-06-09 Thread Dawid Wysakowicz
Hi Ramana, Could you help us with a way to reproduce the behaviour? I could not reproduce it locally. The code below works for me just fine: |StreamExecutionEnvironment exec = StreamExecutionEnvironment.getExecutionEnvironment();|| ||StreamTableEnvironment tEnv = StreamTableEnvironment.create(||

Re: Age old stop vs cancel debate

2020-06-09 Thread Kostas Kloudas
Hi Senthil, >From a quick look at the code, it seems that the cancel() of your source function should be called, and the thread that it is running on should be interrupted. In order to pin down the problem and help us see if this is an actual bug, could you please: 1) enable debug logging and see

Re: Internal state and external stores conditional usage advice sought (dynamodb asyncIO)

2020-06-09 Thread Andrey Zagrebin
Hi Orionemail, There is no simple state access in asyncIO operator. I think this would require a custom caching solution. Maybe, other community users solved this problem in some other way. Best, Andrey On Mon, Jun 8, 2020 at 5:33 PM orionemail wrote: > Hi, > > Following on from an earlier em

Fwd: Understading Flink statefun deployment

2020-06-09 Thread Francesco Guardiani
Hi everybody, I'm quite new to Flink and Flink Statefun and I'm trying to understand the deployment techniques on k8s. I wish to understand if it's feasible to deploy a statefun project separating the different functions on separate deployments (in order to have some functions as remote and some as

Re: Run command after Batch is finished

2020-06-09 Thread Mark Davis
Hi Chesnay, That is an interesting proposal, thank you! I was doing something similar with the OutputFormat#close() method respecting the Format's parallelism. Using FinalizeOnMaster will make things easier. But the problem is that several OutputFormats must be synchronized externally - every ou

Re: Failed to deserialize Avro record

2020-06-09 Thread Dawid Wysakowicz
It's rather hard to help if we don't know the format in which the records are serialized. There is a significant difference if you use a schema registry or not. All schema registries known to me prepend the actual data with some kind of magic byte and an identifier of the schema. Therefore if we do

pyflink数据查询

2020-06-09 Thread jack
问题请教: 描述: pyflink 从source通过sql对数据进行查询聚合等操作 不输出到sink中,而是可以直接作为结果,我这边可以通过开发web接口直接查询这个结果,不必去sink中进行查询。 flink能否实现这样的方式? 感谢

Re: Best way to "emulate" a rich Partitioner with open() and close() methods ?

2020-06-09 Thread Aljoscha Krettek
Hi, I agree with Robert that adding open/close support for partitioners would mean additional complexity in the code base. We're currently not thinking of supporting that. Best, Aljoscha On 05.06.20 20:19, Arvid Heise wrote: Hi Arnaud, just to add up. The overhead of this additional map is

Re: pyflink数据查询

2020-06-09 Thread Jeff Zhang
可以用zeppelin的z.show 来查询job结果。这边有pyflink在zeppelin上的入门教程 https://www.bilibili.com/video/BV1Te411W73b?p=20 可以加入钉钉群讨论:30022475 jack 于2020年6月9日周二 下午5:28写道: > 问题请教: > 描述: pyflink 从source通过sql对数据进行查询聚合等操作 > 不输出到sink中,而是可以直接作为结果,我这边可以通过开发web接口直接查询这个结果,不必去sink中进行查询。 > > flink能否实现这样的方式? > 感谢 > -- Best

sanity checking in ProcessWindowFunction.process shows that event timestamps are out of tumbling window time range

2020-06-09 Thread Yu Yang
Hi all, We are writing an application that set TimeCharacteristic.EventTime as time characteristic. When we implement the ProcessWindowFunction for a TumblingWindow, we added code as below to check if the timestamp of events is in the tumbling window time range. To our surprise, we found that the

[ANNOUNCE] Apache Flink Stateful Functions 2.1.0 released

2020-06-09 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.1.0. Stateful Functions is an API that simplifies building distributed stateful applications. It's based on functions with persistent state that can interact dynamically with strong consistency gu

Blocked requesting MemorySegment when Segments are available.

2020-06-09 Thread David Maddison
Hi, I keep seeing the following situation where a task is blocked getting a MemorySegment from the pool but the TaskManager is reporting that it has lots of MemorySegments available. I'm completely stumped as to how to debug or what to look at next so any hints/help/advice would be greatly apprec

Re: KeyedStream and keyedProcessFunction

2020-06-09 Thread Tzu-Li (Gordon) Tai
Hi, Records with the same key will be processed by the same partition. Note there isn't an instance of a keyed process function for each key. There is a single instance per partition, and all keys that are distributed to the same partition will get processed by the same keyed process function inst

Re: KeyedStream and keyedProcessFunction

2020-06-09 Thread 1048262223
Hi +1. Because there is no need to generate an instance for each key, flink just maintain the key collection in one instance. Imagine what would happen if the number of keys were unlimited. Best, Yichao Yang -- Original -- From: "Tzu-Li (Gordon) Tai"http:/

Re: Failed to deserialize Avro record

2020-06-09 Thread Arvid Heise
If data is coming from Kafka, the write schema is most likely stored in a Schema Registry. If so, you absolutely need to use ConfluentRegistryAvroSerializationSchema of the *flink-avro-confluent-registry* package. If you didn't opt for that most common architecture pattern, then you often run into

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.1.0 released

2020-06-09 Thread Oytun Tez
Thank you, Gordon and everyone. On Tue, Jun 9, 2020 at 5:56 AM Tzu-Li (Gordon) Tai wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Stateful Functions 2.1.0. > > Stateful Functions is an API that simplifies building distributed stateful > applications.

Re: Stopping flink application with /jobs/:jobid/savepoints or /jobs/:jobid/stop

2020-06-09 Thread Kostas Kloudas
Hi Omkar, For the first part of the question where you set the "drain" to false and the state gets drained, this can be an issue on our side. Just to clarify, no matter what is the value of the "drain", Flink always takes a savepoint. Drain simply means that we also send MAX_WATERMARK before takin

Re: Stopping a job

2020-06-09 Thread Kostas Kloudas
Hi all, Just for future reference, there is an ongoing discussion on the topic at another thread found in [1]. So please post any relevant comments there :) Cheers, Kostas [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Age-old-stop-vs-cancel-debate-td35514.html#a35615 O

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.1.0 released

2020-06-09 Thread Congxian Qiu
@Gordon thanks a lot for the release and for being the release manager. Also thanks to everyone who made this release possible! Best, Congxian Oytun Tez 于2020年6月9日周二 下午7:08写道: > Thank you, Gordon and everyone. > > On Tue, Jun 9, 2020 at 5:56 AM Tzu-Li (Gordon) Tai > wrote: > >> The Apache Fli

Re: [External Sender] Flink sql nested elements

2020-06-09 Thread Leonard Xu
Hi, Ramna Happy to hear you’ve resolved your problem, if you could post your SQL maybe this question can get quicker response. Flink SQL is case sensitive default and there had an issue to track[1], I think it makes sense to add some specification in SQL section of docs. Best, Leonard Xu [1] h

Re: Failed to deserialize Avro record

2020-06-09 Thread Dawid Wysakowicz
Good to hear. There is no schema that would support all ways. I would also rather discourage such approach, as it makes it really hard to make changes to the records schema. I would strongly recommend using schema registry for all records. If you still want to have a schema that would work for bo

Re: Data Quality Library in Flink

2020-06-09 Thread aj
Thanks, Andrey, I will check it out. On Mon, Jun 8, 2020 at 8:10 PM Andrey Zagrebin wrote: > Hi Anuj, > > I am not familiar with data quality measurement methods and deequ > in depth. > What you describe looks like monitoring some data metrics. > Maybe, there a

Troubles with Avro migration from 1.7 to 1.10

2020-06-09 Thread Alan Żur
Hi, I was assigned to migrate out Flink 1.7 to 1.10 so far it's going good, however I've encountered problem with Avro writing to hdfs. Currently we're using Bucketing sink - which is deprecated. I've managed to replace few Bucketing sinks with StreamingFileSink with row format. However I don

Incremental state

2020-06-09 Thread Annemarie Burger
Hi, What I'm trying to do is the following: I want to incrementally add and delete elements to a state. If the element expires/goes out of the window, it needs to be removed from the state. I basically want the functionality of TTL, without using it, since I'm also using Queryable State and these

Re: Understading Flink statefun deployment

2020-06-09 Thread Igal Shilman
Hi Francesco, It is absolutely possible to deploy some functions as embedded and some as remote, and scale them independently, while technically being part of the same stateful function application instance (I think that what you meant by "sharing the same master"). One possible way to do it in k

Re: [External Sender] Re: Flink sql nested elements

2020-06-09 Thread Ramana Uppala
Hi Dawid, This issue has been resolved. >From our debugging we found out that Calcite parser was able to resolve the >nested elements as expected. But, expecting case to match with the schema. Our >SQL select field case and schema field case was not matching in this scenario. >After fixing sql

Re: Flink Stream job to parquet sink

2020-06-09 Thread aj
please help with this. Any suggestions. On Sat, Jun 6, 2020 at 12:20 PM aj wrote: > Hello All, > > I am receiving a set of events in Avro format on different topics. I want > to consume these and write to s3 in parquet format. > I have written a below job that creates a different stream for each

Re: Failed to deserialize Avro record

2020-06-09 Thread Ramana Uppala
Hi Arvid / Dawid, Yes we did small POC with custom Avro Row Deserializer which uses ConfluentRegistryAvroDeSerializationSchema and we are able to parse the message. We have Schema registry and users are given choice to produce with different serialization mechanisms. Some messages we are able t

Re: Age old stop vs cancel debate

2020-06-09 Thread Senthil Kumar
OK, will do and report back. We are on 1.9.1, 1.10 will take some time __ On 6/9/20, 2:06 AM, "Kostas Kloudas" wrote: Hi Senthil, From a quick look at the code, it seems that the cancel() of your source function should be called, and the thread that it is running on shoul

Dynamic rescaling in Flink

2020-06-09 Thread Prasanna kumar
Hi all, Does flink support dynamic scaling. Say try to add/reduce nodes based upon incoming load. Because our use case is such that we get peak loads for 4 hours and then medium loads for 8 hours and then light to no load for rest 2 hours. Or peak load would be atleast 5 times the medium load.

Re: Age old stop vs cancel debate

2020-06-09 Thread Kostas Kloudas
I understand. Thanks for looking into it Senthil! Kostas On Tue, Jun 9, 2020 at 7:32 PM Senthil Kumar wrote: > > OK, will do and report back. > > We are on 1.9.1, > > 1.10 will take some time __ > > On 6/9/20, 2:06 AM, "Kostas Kloudas" wrote: > > Hi Senthil, > > From a quick look at th

Re: Troubles with Avro migration from 1.7 to 1.10

2020-06-09 Thread Kostas Kloudas
Hi Alan, In the upcoming Flink 1.11 release, there will be support for Avro using the AvroWriterFactory as seen in [1]. Do you think that this can solve your problem? You can also download the current release-1.11 branch and also test it out to see if it fits your needs. Cheers, Kostas [1] http

NoSuchMethodError: org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.(Ljava/lang/String;Lorg/apache/flink/fs/s3presto/common/HadoopConfigLoader

2020-06-09 Thread Claude Murad
Hello, I'm trying to upgrade Flink from 1.7 to 1.10 retaining our Hadoop integration. I copied the jar file flink-shaded-hadoop-2-uber-2.7.5-10.0.jar into /opt/flink/lib. I also copied the files flink-s3-fs-hadoop-1.10.0.jar and flink-s3-fs-presto-1.10.0.jar into /opt/flink/plugins/s3 folder. T

Re: Stopping flink application with /jobs/:jobid/savepoints or /jobs/:jobid/stop

2020-06-09 Thread Deshpande, Omkar
I have observed that state gets drained irrespective of the value of the "drain". I am using - org.apache.beam beam-runners-flink-1.9 2.19.0 And I am running a kafka wordcount app with fixed window of 1 hour and when I stop the app with the stop endpoint

Tumbling window with timestamp out-of-range events

2020-06-09 Thread Yu Yang
Hi, We implement a flink application that uses TumblingWindow, and uses even time as time characteristics. In the TumblingWindow's process function, we has the implementation below that checks whether the event's timestamp is in the tumbling window's timestamp range. We expected that all events

Re: Tumbling window with timestamp out-of-range events

2020-06-09 Thread Yu Yang
Please ignore this message. The issue was that a different timestamp extractor was used when the kafka source was setup. That caused the issue. On Tue, Jun 9, 2020 at 2:58 PM Yu Yang wrote: > Hi, > > > We implement a flink application that uses TumblingWindow, and uses even > time as time charac

Re: NoSuchMethodError: org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.(Ljava/lang/String;Lorg/apache/flink/fs/s3presto/common/HadoopConfigLoader

2020-06-09 Thread Guowei Ma
Hi, In 1.10 there is no 'Lorg/apache/flink/fs/s3presto/common/HadoopConfigLoader' . So I think there might be a legacy S3FileSystemFactory in your jar. You could check whether there is a 'org.apache.flink.fs.s3presto.common.HadoopConfigLoader' in your jar or not. If there is one you could remove th

Re: Dynamic rescaling in Flink

2020-06-09 Thread Xintong Song
Hi Prasanna, Flink does not support dynamic rescaling at the moment. AFAIK, there are some companies in China already have solutions for dynamic scaling Flink jobs (Alibaba, 360, etc.), but none of them are yet available to the community version. These solutions rely on an external system to moni

Re: How To subscribe a Kinesis Stream using enhance fanout?

2020-06-09 Thread Tzu-Li (Gordon) Tai
Hi all, Just to give a quick update: there will be contributors from the AWS Kinesis team working on contributing enhanced fan out support to the connector. You can follow the progress here: https://issues.apache.org/jira/browse/FLINK-17688 Cheers, Gordon On Fri, May 15, 2020 at 5:55 PM orionema