Re: [DataStream API] Best practice to handle custom session window - time gap based but handles event for ending session

2019-08-26 Thread Fabian Hueske
I'd like to thank, I'm learning Flink with the new book "Stream > Processing with Apache Flink". :) Thanks for your amazing efforts on > publishing nice book! > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Mon, Aug 5, 2019 at 10:21 PM Fabian Hueske wr

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
ler. The latest timestamp > will be handled first ? > > > > > > BTW I tried to use a ContinuousEventTimeTrigger to make sure the window > is calculated ? and got the processing to trigger multiple times so I’m > not sure exactly how this type of trigger works.. > > > > Tha

Re: OVER operator filtering out records

2019-08-26 Thread Fabian Hueske
Hi Vinod, This sounds like a watermark issue to me. The commonly used watermark strategies (like bounded out-of-order) are only advancing when there is a new record. Moreover, the current watermark is the minimum of the current watermarks of all input partitions. So, the watermark only moves forwa

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
Hi, Can you share a few more details about the data source? Are you continuously ingesting files from a folder? You are correct, that the parallelism should not affect the results, but there are a few things that can affect that: 1) non-determnistic keys 2) out-of-order data with inappropriate wa

Re: combineGroup get false results

2019-08-22 Thread Fabian Hueske
gt; My key fields is array of multiple type, in this case is string and long. > The result that i'm posting is just represents sampling of output dataset. > > Thank you in advance ! > > Anissa > > Le jeu. 22 août 2019 à 11:24, Fabian Hueske a écrit : > >> H

Re: combineGroup get false results

2019-08-22 Thread Fabian Hueske
Hi Anissa, This looks strange. If I understand your code correctly, your GroupReduce function is summing up a field. Looking at the results that you posted, it seems as if there is some data missing (the total sum does not seem to match). For groupReduce it is important that the grouping keys are

Re: Maximal watermark when two streams are connected

2019-08-22 Thread Fabian Hueske
Hi Sung, There is no switch to configure the WM to be the max of both streams and it would also in fact violate the core principles of the mechanism. Watermarks are used to track the progress of event time in streams. The implementations of operators rely on the fact that (almost) all records tha

Re: How to shorten MATCH_RECOGNIZE's DEFINE clause

2019-08-22 Thread Fabian Hueske
Hi Dongwon, I'm not super familiar with Flink's MATCH_RECOGNIZE support, but Dawid (in CC) might have some ideas about it. Best, Fabian Am Mi., 21. Aug. 2019 um 07:23 Uhr schrieb Dongwon Kim < eastcirc...@gmail.com>: > Hi, > > Flink relational apis with MATCH_RECOGNITION looks very attractive a

Re: Error while sinking results to Cassandra using Flink Cassandra Connector

2019-08-22 Thread Fabian Hueske
Hi Manvi, A NoSuchMethodError typically indicates a version mismatch. I would check if the Flink versions of your program, the client, and the cluster are the same. Best, Fabian Am Di., 20. Aug. 2019 um 21:09 Uhr schrieb manvmali : > Hi, I am facing the issue of writing the data stream result i

Re: combineGroup get false results

2019-08-22 Thread Fabian Hueske
Hi Anissa, Are you using combineGroup or reduceGroup? Your question refers to combineGroup, but the code only shows reduceGroup. combineGroup is non-deterministic by design to enable efficient partial results without network and disk IO. reduceGroup is deterministic given a deterministic key extr

Re: How to implement Multi-tenancy in Flink

2019-08-19 Thread Fabian Hueske
Great! Thanks for the feedback. Cheers, Fabian Am Mo., 19. Aug. 2019 um 17:12 Uhr schrieb Ahmad Hassan < ahmad.has...@gmail.com>: > > Thank you Fabian. This works really well. > > Best Regards, > > On Fri, 16 Aug 2019 at 09:22, Fabian Hueske wrote: > >> Hi

Re: How to implement Multi-tenancy in Flink

2019-08-16 Thread Fabian Hueske
dows. > > Thanks. > > Best, > > On 2 Aug 2019, at 12:49, Fabian Hueske wrote: > > Ok, I won't go into the implementation detail. > > The idea is to track all products that were observed in the last five > minutes (i.e., unique product ids) in a five minute tumb

Re: Kafka producer failed with InvalidTxnStateException when performing commit transaction

2019-08-16 Thread Fabian Hueske
Hi Tony, I'm sorry I cannot help you with this issue, but Becket (in CC) might have an idea what went wrong here. Best, Fabian Am Mi., 14. Aug. 2019 um 07:00 Uhr schrieb Tony Wei : > Hi, > > Currently, I was trying to update our kafka cluster with larger ` > transaction.max.timeout.ms`. The > o

Re: I'm not able to make a stream-stream Time windows JOIN in Flink SQL

2019-08-16 Thread Fabian Hueske
Hi Theo, The main problem is that the semantics of your join (Join all events that happened on the same day) are not well-supported by Flink yet. In terms of true streaming joins, Flink supports the time-windowed join (with the BETWEEN predicate) and the time-versioned table join (which does not

Re: End of Window Marker

2019-08-16 Thread Fabian Hueske
Hi Padarn, What you describe is essentially publishing Flink's watermarks to an outside system. Flink processes time windows, by waiting for a watermark that's past the window end time. When it receives such a WM it processes and emits all ended windows and forwards the watermark. When a sink rece

Re: [External] Re: From Kafka Stream to Flink

2019-08-16 Thread Fabian Hueske
sult of that queries taking into account only the last > values of each row. The result is inserted/updated in a in-memory K-V > database for fast access. > > > > Thanks in advance! > > > > Best > > > > *De: *Fabian Hueske > *Fecha: *miércoles, 7 de agost

Re: Implementing a low level join

2019-08-15 Thread Fabian Hueske
ess a > checkpoint I can change the join strategy. > > and if you do, do you have any toy example of this? > > Thanks, > Felipe > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipe

Re: Implementing a low level join

2019-08-15 Thread Fabian Hueske
Hi, Just to clarify. You cannot dynamically switch the join strategy while a job is running. What Hequn suggested was to have a util method Util.joinDynamically(ds1, ds2) that chooses the join strategy when the program is generated (before it is submitted for execution). The problem is that distr

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-15 Thread Fabian Hueske
Congrats Andrey! Am Do., 15. Aug. 2019 um 07:58 Uhr schrieb Gary Yao : > Congratulations Andrey, well deserved! > > Best, > Gary > > On Thu, Aug 15, 2019 at 7:50 AM Bowen Li wrote: > > > Congratulations Andrey! > > > > On Wed, Aug 14, 2019 at 10:18 PM Rong Rong wrote: > > > >> Congratulations A

Re: Apache flink 1.7.2 security issues

2019-08-13 Thread Fabian Hueske
Thanks for reporting this issue. It is already discussed on Flink's dev mailing list in this thread: -> https://lists.apache.org/thread.html/10f0f3aefd51444d1198c65f44ffdf2d78ca3359423dbc1c168c9731@%3Cdev.flink.apache.org%3E Please continue the discussion there. Thanks, Fabian Am Di., 13. Aug.

Re: Flink Table API schema doesn't support nested object in ObjectArray

2019-08-09 Thread Fabian Hueske
x this issue could be > pretty simple . > > Thanks > Jacky Du > > Fabian Hueske 于2019年8月2日周五 下午12:07写道: > >> Thanks for the bug report Jacky! >> >> Would you mind opening a Jira issue, preferably with a code snippet that >> reproduces the bug? >

Re: how to get the code produced by Flink Code Generator

2019-08-07 Thread Fabian Hueske
Hi Vincent, I don't think there is such a flag in Flink. However, this sounds like a really good idea. Would you mind creating a Jira ticket for this? Thank you, Fabian Am Di., 6. Aug. 2019 um 17:53 Uhr schrieb Vincent Cai < caidezhi...@foxmail.com>: > Hi Users, > In Spark, we can invoke Data

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Fabian Hueske
Congratulations Hequn! Am Mi., 7. Aug. 2019 um 14:50 Uhr schrieb Robert Metzger < rmetz...@apache.org>: > Congratulations! > > On Wed, Aug 7, 2019 at 1:09 PM highfei2...@126.com > wrote: > > > Congrats Hequn! > > > > Best, > > Jeff Yang > > > > > > Original Message > > Subject:

Re: From Kafka Stream to Flink

2019-08-07 Thread Fabian Hueske
>> how much state the query will need to maintain. >> >> >> I am not sure to understand the problem. If i have to append-only table >> and perform some join on it, what's the issue ? >> >> >> On Tue, Aug 6, 2019 at 8:03 PM Maatary Okouya &

Re: [DataStream API] Best practice to handle custom session window - time gap based but handles event for ending session

2019-08-05 Thread Fabian Hueske
Hi Jungtaek, I would recommend to implement the logic in a ProcessFunction and avoid Flink's windowing API. IMO, the windowing API is difficult to use, because there are many pieces like WindowAssigner, Window, Trigger, Evictor, WindowFunction that are orchestrated by Flink. This makes it very har

Re: Flink Table API schema doesn't support nested object in ObjectArray

2019-08-02 Thread Fabian Hueske
Thanks for the bug report Jacky! Would you mind opening a Jira issue, preferably with a code snippet that reproduces the bug? Thank you, Fabian Am Fr., 2. Aug. 2019 um 16:01 Uhr schrieb Jacky Du : > Hi, All > > Just find that Flink Table API have some issue if define nested object in > an objec

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
> > > However, with your proposed solution, how would we be able to achieve this > sliding window mechanism of emitting 24 hour window every 5 minute using > processfunction ? > > > Best, > > > On Fri, 2 Aug 2019 at 09:48, Fabian Hueske wrote: > >> Hi Ahmad, >&g

Re: Apache Flink and additional fileformats (Excel, blockchains)

2019-08-02 Thread Fabian Hueske
Hi Joern, Thanks for sharing your connectors! The Flink community is currently working on a website that collects and lists externally maintained connectors and libraries for Flink. We are still figuring out some details, but hope that it can go live soon. Would be great to have your repositories

Re: Best pattern to signal a watermark > t across all tasks?

2019-08-02 Thread Fabian Hueske
Hi, Regarding step 3, it is sufficient to check that you got on message from each parallel task of the previous operator. That's because a task processes the timers of all keys before moving forward. Timers are always processed per key, but you could deduplicate on the parallel task id and check t

Re: From Kafka Stream to Flink

2019-08-02 Thread Fabian Hueske
Hi, Flink does not distinguish between streams and tables. For the Table API / SQL, there are only tables that are changing over time, i.e., dynamic tables. A Stream in the Kafka Streams or KSQL sense, is in Flink a Table with append-only changes, i.e., records are only inserted and never deleted

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
. 2019 um 20:00 Uhr schrieb Ahmad Hassan < ahmad.has...@gmail.com>: > > Hi Fabian, > > > On 4 Jul 2018, at 11:39, Fabian Hueske wrote: > > > > - Pre-aggregate records in a 5 minute Tumbling window. However, > pre-aggregation does not work for FoldFunctions. &g

Re: Support priority of the Flink YARN application in Flink 1.9

2019-08-02 Thread Fabian Hueske
Hi Boxiu, This sounds like a good feature. Please have a look at our contribution guidelines [1]. To propose a feature, you should open a Jira issue [2] and start a discussion there. Please note that the feature freeze for the Flink 1.9 release happened a few weeks ago. The community is currentl

Re: Is Queryable State not working in standalone cluster with 1 task manager?

2019-07-30 Thread Fabian Hueske
Hi Oytun, Is QS enabled in your Docker image or did you enable QS by copying/moving flink-queryable-state-runtime_2.11-1.8.0.jar from ./opt to ./lib [1]? Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html#activating-queryable-state Am M

Re: Extending REST API with new endpoints

2019-07-29 Thread Fabian Hueske
Hi Oytun, Thanks for your input and feature request! The right way to propose a feature and contribute it is described here [1]. Basically, you should open a Jira issue and start a discussion about the feature there. If it is a bigger features, you should also bring it to the dev@f.a.o mailing li

Re: LEFT JOIN issue SQL API

2019-07-29 Thread Fabian Hueske
If you need an outer join, the only solution is to convert the table into a retraction stream and correctly handle the retraction messages. Btw. even then this might not perform as you would like it to be. The query will store all input tables completely in state. So you might run out of space soon

[ANNOUNCE] The Program of Flink Forward EU 2019 is live

2019-07-24 Thread Fabian Hueske
Hi everyone, I'm happy to announce the program of the Flink Forward EU 2019 conference. The conference takes place in the Berlin Congress Center (bcc) from October 7th to 9th. On the first day, we'll have four training sessions [1]: * Apache Flink Developer Training * Apache Flink Operations Trai

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
Sure: /--> AsyncIO --\ STREAM --> ProcessFunc -- -- Union -- WindowFunc \--/ ProcessFunc keeps track of the unique keys per window duration and emits each

Re: CEP Pattern limit

2019-07-23 Thread Fabian Hueske
Hi Pedro, each pattern gets translated into one or more Flink operators. Hence, your Flink program becomes *very* large and requires much more time to be deployed. Hence, the timeout. I'd try to limit the size your job by grouping your patterns and creating an own job for each group. You can also

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
AsyncDataStream should provide a first-class support to keyed streams > (and thus perform a single call per key and window..). What do you think? > > On Tue, Jul 23, 2019 at 12:56 PM Fabian Hueske wrote: > >> Hi Flavio, >> >> Not sure I understood the requirements c

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Fabian Hueske
our production clustered crippled like this. > > Richard > > On Tue, Jul 23, 2019 at 12:47 PM Fabian Hueske wrote: > >> Hi Richard, >> >> I hope you could resolve the problem in the meantime. >> >> Nonetheless, maybe Till (in CC) has an idea what could h

Re: Use batch and stream environment in a single pipeline

2019-07-23 Thread Fabian Hueske
Hi, Right now it is not possible to mix batch and streaming environments in a job. You would need to implement the batch logic via the streaming API which is not always straightforward. However, the Flink community is spending a lot of effort on unifying batch and stream processing. So this will

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-23 Thread Fabian Hueske
Hi Dongwon, regarding the question about the conversion: If you keep using the Row type and not adding/removing fields, the conversion is pretty much for free right now. It will be a MapFunction (sometimes even not function at all) that should be chained with the other operators. Hence, it should

Re: Flink SinkFunction for WebSockets

2019-07-23 Thread Fabian Hueske
Hi Tim, One thing that might be interesting is that Flink might emit results more than once when a job recovers from a failure. It is up to the receiver to deal with that. Depending on the type of results this might be easy (idempotent updates) or impossible. Best, Fabian Am Fr., 19. Juli 2019

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
Hi Flavio, Not sure I understood the requirements correctly. Couldn't you just collect and bundle all records with a regular window operator and forward one record for each key-window to an AsyncIO operator? Best, Fabian Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier < pomperma...

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Fabian Hueske
Hi Richard, I hope you could resolve the problem in the meantime. Nonetheless, maybe Till (in CC) has an idea what could have gone wrong. Best, Fabian Am Mi., 17. Juli 2019 um 19:50 Uhr schrieb Richard Deurwaarder < rich...@xeli.eu>: > Hello, > > I've got a problem with our flink cluster where

Re: Does Flink support raw generic types in a merged stream?

2019-07-23 Thread Fabian Hueske
Hi John, You could implement your own n-ary Either type. It's a bit of work because you'd need also a custom TypeInfo & Serializer but rather straightforward if you follow the implementation of Either. Best, Fabian Am Mi., 17. Juli 2019 um 16:28 Uhr schrieb John Tipper < john_tip...@hotmail.com>

Re: Union of streams performance issue (10x)

2019-07-23 Thread Fabian Hueske
Hi Peter, The performance drops probably be due to de/serialization. When tasks are chained, records are simply forwarded as Java objects via method calls. When a task chain in broken into multiple operators, the records (Java objects) are serialized by the sending task, possibly shipped over the

Re: timeout exception when consuming from kafka

2019-07-23 Thread Fabian Hueske
Hi Yitzchak, Thanks for reaching out. I'm not an expert on the Kafka consumer, but I think the number of partitions and the number of source tasks might be interesting to know. Maybe Gordon (in CC) has an idea of what's going wrong here. Best, Fabian Am Di., 23. Juli 2019 um 08:50 Uhr schrieb Y

Re: MiniClusterResource class not found using AbstractTestBase

2019-07-23 Thread Fabian Hueske
Hi Juan, Which Flink version do you use? Best, Fabian Am Di., 23. Juli 2019 um 06:49 Uhr schrieb Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com>: > Hi, > > I'm trying to use AbstractTestBase in a test in order to use the mini > cluster. I'm using specs2 with Scala, so I cannot extend

Re: GroupBy result delay

2019-07-23 Thread Fabian Hueske
Hi Fanbin, The delay is most likely caused by the watermark delay. A window is computed when the watermark passes the end of the window. If you configured the watermark to be 10 minutes before the current max timestamp (probably to account for out of order data), then the window will be computed w

Re: Apache Flink - Relation between stream time characteristic and timer triggers

2019-07-11 Thread Fabian Hueske
ics, if a > processor or window trigger registers with a ProcessingTime and EventTime > timers - they will all fire when the appropriate watermarks arrive. > > Thanks again. > > On Thursday, July 11, 2019, 05:41:54 AM EDT, Fabian Hueske < > fhue...@gmail.com> wrote: > >

[ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Fabian Hueske
Hi everyone, I'm very happy to announce that Rong Rong accepted the offer of the Flink PMC to become a committer of the Flink project. Rong has been contributing to Flink for many years, mainly working on SQL and Yarn security features. He's also frequently helping out on the user@f.a.o mailing l

Re: Apache Flink - Relation between stream time characteristic and timer triggers

2019-07-11 Thread Fabian Hueske
Hi Mans, IngestionTime is uses the same internal mechanisms as EventTime (record timestamps and watermarks). The difference is that instead of extracting a timestamp from the record (using a custom timestamp extractor & wm assigner), Flink will assign timestamps based on the machine clock of the

Re: Cannot catch exception throws by kafka consumer with JSONKeyValueDeserializationSchema

2019-07-11 Thread Fabian Hueske
Hi, I'd suggest to implement your own custom deserialization schema for example by extending JSONKeyValueDeserializationSchema. Then you can implement whatever logic you need to handle incorrectly formatted messages. Best, Fabian Am Mi., 10. Juli 2019 um 04:29 Uhr schrieb Zhechao Ma < mazhechaom

Re: How are kafka consumer offsets handled if sink fails?

2019-07-11 Thread Fabian Hueske
Hi John, let's say Flink performed a checkpoint after the 2nd record (by injecting a checkpoint marker into the data flow) and the sink fails on the 5th record. When Flink restarts the application, it resets the offset after the 2nd record (it will read the 3rd record first). Hence, the 3rd and 4t

Re: Disk full problem faced due to the Flink tmp directory contents

2019-07-10 Thread Fabian Hueske
Hi, AFAIK Flink should remove temporary files automatically when they are not needed anymore. However, I'm not 100% sure that there are not corner cases when a TM crashes. In general it is a good idea to properly configure the directories that Flink uses for spilling, logging, blob storage, etc.

Re: Error checkpointing to S3 like FS (EMC ECS)

2019-07-08 Thread Fabian Hueske
Hi Vishwas, Sorry for the late response. Are you still facing the issue? I have no experience with EMC ECS, but the exception suggests an issue with the host name: 1378 Caused by: java.net.UnknownHostException: aip-featuretoolkit.SU73ECSG1P1d.***.COM 1379 at java.net.InetAddress.getAl

Re: How does Flink recovers uncommited Kafka offset in AsyncIO?

2019-07-08 Thread Fabian Hueske
Hi, Kafka offsets are only managed by the Flink Kafka Consumer. All following operators do not care whether the events were read from Kafka, files, Kinesis or whichever source. It is the responsibility of the source to include its reading position (in case of Kafka the partition offsets) in a chec

Re: Flink Kafka consumer with low latency requirement

2019-06-24 Thread Fabian Hueske
Hi, What kind of function do you use to implement the operator that has the blocking call? Did you have a look at the AsyncIO operator? It was designed for exactly such use cases. It issues multiple asynchronous requests to an external service and waits for the response. Best, Fabian Am Mo., 24.

Re: Unexpected behavior from interval join in Flink

2019-06-24 Thread Fabian Hueske
d up using xStream as a 'base' while > I needed to use yStream as a 'base'. I.e. yStream.element - 60 min <= > xStream.element <= yStream.element + 30 min. Interchanging both datastreams > fixed this issue. > > Thanks anyways. > > Cheers, Wouter >

Re: Unexpected behavior from interval join in Flink

2019-06-24 Thread Fabian Hueske
Hi Wouter, Not sure what is going wrong there, but something that you could try is to use a custom watemark assigner and always return a watermark of 0. When the source finished serving the watermarks, it emits a final Long.MAX_VALUE watermark. Hence the join should consume all events and store th

Re: Flink Kafka consumer with low latency requirement

2019-06-24 Thread Fabian Hueske
Hi Ben, Flink's Kafka consumers track their progress independent of any worker. They keep track of the reading offset for themselves (committing progress to Kafka is optional and only necessary to have progress monitoring in Kafka's metrics). As soon as a consumer reads and forwards an event, it i

Re: Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-14 Thread Fabian Hueske
Hi Visha, If I remember correctly, the behavior of the Kafka consumer was changed in Flink 1.8 to account for such situations. Please check the release notes [1] and the corresponding Jira issue [2]. If this is not the behavior you need, please feel free to create a new Jira issue and start a dis

Re: Building flink from source---error of resolving dependencies

2019-06-13 Thread Fabian Hueske
Hi Syed, The build fails because Maven could not download the required dependency com.mapr.hadoop:maprfs:jar:5.2.1-mapr. The dependency is hosted on MapR's Maven repository. Maybe the service was not available for some time. I checked it right now and it seems to be working. I'd suggest to try it

Re: About SerializationSchema/DeserializationSchema's concurrency safety

2019-06-11 Thread Fabian Hueske
Hi, Yes, multiple instances of the same De/SerializationSchema can be executed in the same JVM. Regarding 2. I'm not 100%, but would suspect that one De/SerializationSchema instance handles multiple partitions. Gordon (in CC) should know this for sure. Best, Fabian Am Mo., 10. Juni 2019 um 05:25

Re: Avro serde classes in Flink

2019-06-11 Thread Fabian Hueske
Hi Debasish, No, I don't think there's a particular reason. There a few Jira issues that propose adding an Avro Serialization Schema for Confluent Schema Registry [1] [2]. Please check them out and add a new one if they don't describe what you are looking for. Cheers, Fabian [1] https://issues.a

Re: count(DISTINCT) in flink SQL

2019-06-11 Thread Fabian Hueske
t;> ) >> >> GROUP BY user_id, event_month, event_year >> >> >> >> We are also using idle state retention time to clean up unused state, but >> that is much longer (a week or month depending on the usecase). We will >> switch to count(DISTINCT) as soo

Re: NotSerializableException: DropwizardHistogramWrapper inside AggregateFunction

2019-06-07 Thread Fabian Hueske
Hi, There are two ways: 1. make the non-serializable member variable transient (meaning that it won't be serialized) and check in the aggregate call if it has been initialized or not. 2. implement your own serialization logic by overriding readObject() and writeObject() [1]. Best, Fabian [1] ht

Re: Flink 1.7.1 flink-s3-fs-hadoop-1.7.1 doesn't delete older chk- directories

2019-06-07 Thread Fabian Hueske
Hi, I found a few issues in Jira that are related to not deleted checkpoint directories, but only FLINK-10855 [1] seems to be a possible reason in your case. Is it possible that the checkpoints of the remaining directories failed? If that's not the case, would you mind creating a Jira issue and d

Re: Ipv6 supported?

2019-06-07 Thread Fabian Hueske
Hi, The networking libraries that Flink uses (Netty & Akka) support seem to support IPv6. So, it might work. However, I'm not aware of anybody running Flink on IPv6. Maybe somebody with more info could help out here? Best, Fabian Am Do., 6. Juni 2019 um 16:25 Uhr schrieb Siew Wai Yow : > Hi gu

Re: Change sink topology

2019-06-06 Thread Fabian Hueske
Hi Sergey, I would not consider this to be a topology change (the sink operator would still be a Kafka producer). It seems that dynamic topic selection is possible with a KeyedSerializationSchema (Check "Advanced Serialization Schema: [1]). Best, Fabian [1] https://ci.apache.org/projects/flink/f

Re: Does Flink Kafka connector has max_pending_offsets concept?

2019-06-06 Thread Fabian Hueske
Hi Ben, Flink correctly maintains the offsets of all partitions that are read by a Kafka consumer. A checkpoint is only complete when all functions successful checkpoint their state. For a Kafka consumer, this state is the current reading offset. In case of a failure the offsets and the state of a

Re: Weird behavior with CoFlatMapFunction

2019-06-06 Thread Fabian Hueske
Hi, There are a few things to point out about your example: 1. The the CoFlatMapFunction is probably executed in parallel. The configuration is only applied to one of the parallel function instances. You probably want to broadcast the configuration changes to all function instances. Have a look a

Re: can flink sql handle udf-generated timestamp field

2019-06-06 Thread Fabian Hueske
Hi Yu, When you register a DataStream as a Table, you can create a new attribute that contains the event timestamp of the DataStream records. For that, you would need to assign timestamps and generate watermarks before registering the stream: FlinkKafkaConsumer kafkaConsumer = new FlinkKa

Re: count(DISTINCT) in flink SQL

2019-06-03 Thread Fabian Hueske
Hi Vinod, IIRC, support for DISTINCT aggregates was added in Flink 1.6.0 (released August, 9th 2018) [1]. Also note that by default, this query will accumulate more and more state, i.e., for each grouping key it will hold all unique event_ids. You could configure an idle state retention time to cl

[ANNOUNCE] Munich meetup: "Let's talk about "Stream Processing with Apache Flink"

2019-05-28 Thread Fabian Hueske
Hi folks, Next Tuesday (June, 4th), Vasia and I will be speaking at a meetup in Munich about Flink and how we wrote our book "Stream Processing with Apache Flink". We will also raffle a few copies of the book. Please RSVP if you'd like to attend: -> https://www.meetup.com/inovex-munich/events/26

Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread Fabian Hueske
I see, that's unfortunate. Both classes are also tagged with @Public, making them unchangeable until Flink 2.0. Nonetheless, feel free to open a Jira issue to improve the situation for a future release. Best, Fabian Am Mo., 27. Mai 2019 um 16:55 Uhr schrieb spoganshev : > I've tried that, but t

Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread Fabian Hueske
Configuring the split assigner wasn't a common requirement so far. You can just implement your own format extending from FileInputFormat (or any of its subclasses) and override the getInputSplitAssigner() method. Best, Fabian Am Mo., 27. Mai 2019 um 15:30 Uhr schrieb spoganshev : > Why is FileIn

Re: Question regarding date conditions/row expirations on Dynamic Tables

2019-05-27 Thread Fabian Hueske
Hi Wayne, Long story short, this is not possible with Flink yet. I posted a more detailed answer to your question on SO. Best, Fabian Am Di., 21. Mai 2019 um 19:24 Uhr schrieb Wayne Heaney < wayne.hea...@gmail.com>: > I'm trying to build a Dynamic table that will be updated when records > haven

Re: Flink ML Use cases

2019-05-25 Thread Fabian Hueske
Hi Abhishek, Your observation is correct. Right now, the Flink ML module is in a half-baked state and is only supported in batch mode. It is not integrated with the DataStream API. FLIP-23 proposes a feature that allows to evaluated an externally trained model (stored as PMML) on a stream of data.

Re: FlinkSQL fails when rowtime meets dirty data

2019-05-16 Thread Fabian Hueske
Hi, I'm afraid I don't see another solution than touching the Flink code for this and adding a try catch block around the timestamp conversion. It would be great if you could create a Jira issue reporting this problem. IMO, we should have a configuration switch (either per Table or query) to eith

Re: Flink and Prometheus setup in K8s

2019-05-16 Thread Fabian Hueske
Thanks for sharing your solution Wouter! Best, Fabian Am Mi., 15. Mai 2019 um 15:28 Uhr schrieb Wouter Zorgdrager < w.d.zorgdra...@tudelft.nl>: > Hi all, > > To answer my own questions I worked on the following solution: > > 1) Custom Docker image which pulls the Flink image and moves Prometheus

Re: RichAsyncFunction for Scala?

2019-05-16 Thread Fabian Hueske
Hi Shannon, That's a good observation. To be honest, I know why the Scala AsyncFunction does not implement RichFunction. Maybe this was not intentional and just overlooked when porting the functionality to Scala. Would you mind creating a Jira ticket for this? Thank you, Fabian Am Di., 14. Mai

Re: Reconstruct object through partial select query

2019-05-14 Thread Fabian Hueske
cParameter for some reason. Also, how would you create the serializer >>>> for the type info? can i reuse some builtin Kryo functionality? >>>> >>>> Thanks >>>> >>>> >>>> [1] >>>> https://ci.apache.org/projects/flink/flink-d

Re:

2019-05-13 Thread Fabian Hueske
marks, it creates watermarks from it's received data. Since it doesn't > receive any data, it doesn't create any watermarks. D couldn't make > progress because one of its inputs, C2, doesn't make progress. Is this > understand correct? > > Yes, I think that&#

Re: Checkpointing and save pointing

2019-05-10 Thread Fabian Hueske
g checkpointing location. > > > Boris Lublinsky > FDP Architect > boris.lublin...@lightbend.com > https://www.lightbend.com/ > > On May 10, 2019, at 2:47 AM, Fabian Hueske wrote: > > Hi Boris, > > Is your question is in the context of replacing Zookeeper by a different &

Re: How to export all not-null keyed ValueState

2019-05-10 Thread Fabian Hueske
method? > > Thanks a lot for your help. > > Regards, > Averell > > > On Fri, May 10, 2019 at 8:52 PM Fabian Hueske wrote: > >> Hi Averell, >> >> Ah, sorry. I had assumed the toggle events where broadcasted anyway. >> Since you had both streams keyed, your c

Re: How to export all not-null keyed ValueState

2019-05-10 Thread Fabian Hueske
context.applyToKeyedState(toggleStateDescriptor, (k: Key, s: > ValueState[Boolean]) => > if (s != null) context.output(outputTag, (k, s.value( >} > } > > Thanks for your help. > Regards, > Averell > > On Thu, May 9, 2019 at 7:31 PM Fabian Hueske wrote: &g

Re:

2019-05-10 Thread Fabian Hueske
any point in time. Also Flink does does not give any guarantees about how keys (or rather key groups) are assigned to tasks. If you rescale the application to a parallelism of 3, the active key group might be scheduled to C.2 or C.3. Long story short, D makes progress in event time because watermarks

Re: Reduce key state

2019-05-10 Thread Fabian Hueske
Hi Frank, By default, Flink does not remove any state. It is the responsibility of the developer to ensure that an application does not leak state. Typically, you would use timers [1] to discard state that expired and is not useful anymore. In the last release 1.8, we added lazy cleanup strategie

Re: I want to use MapState on an unkeyed stream

2019-05-10 Thread Fabian Hueske
reatment of operator state documented anywhere? > > On 2019/05/09 07:39:34, Fabian Hueske wrote: > > Hi, > > > > Yes, IMO it is more clear. > > However, you should be aware that operator state is maintained on heap > only > > (not in RocksDB). > > &g

Re: Checkpointing and save pointing

2019-05-10 Thread Fabian Hueske
Hi Boris, Is your question is in the context of replacing Zookeeper by a different service for highly-available setups or are you setting up a regular Flink cluster? Best, Fabian Am Mi., 8. Mai 2019 um 06:20 Uhr schrieb Congxian Qiu < qcx978132...@gmail.com>: > Hi, Boris > > TM will also need

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-09 Thread Fabian Hueske
guishing receiving (different) watermarks and emitting (the same) watermarks. Best, Fabian > On 2019/05/03 07:32:07, Fabian Hueske wrote: > > Hi, > > > > this should be covered here: > > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time

Re: How to export all not-null keyed ValueState

2019-05-09 Thread Fabian Hueske
Hi, Passing a Context through a DataStream definitely does not work. You'd need to have the keyed state that you want to scan over in the KeyedBroadcastProcessFunction. For the toggle filter use case, you would need to have a unioned stream with Toggle and StateReport events. For the output, you

Re: How to export all not-null keyed ValueState

2019-05-09 Thread Fabian Hueske
Hi, The KeyedBroadcastProcessFunction has a method to iterate over all keys of a keyed state. This function is available via the Context object of the processBroadcast() method. Hence you need a broadcasted message to trigger the operation. Best, Fabian Am Do., 9. Mai 2019 um 08:46 Uhr schrieb C

Re: Reconstruct object through partial select query

2019-05-09 Thread Fabian Hueske
Hi, you can use the value construction function ROW to create a nested row (or object). However, you have to explicitly reference all attributes that you will add. If you have a table Cars with (year, modelName) a query could look like this: SELECT ROW(year, modelName) AS car, enrich(year, m

Re: I want to use MapState on an unkeyed stream

2019-05-09 Thread Fabian Hueske
Hi, Yes, IMO it is more clear. However, you should be aware that operator state is maintained on heap only (not in RocksDB). Best, Fabian Am Mi., 8. Mai 2019 um 20:44 Uhr schrieb an0 : > I switched to using operator list state. It is more clear. It is also > supported by RocksDBKeyedStateBacke

Re: taskmanager.tmp.dirs vs io.tmp.dirs

2019-05-09 Thread Fabian Hueske
Hi, I created FLINK-12460 to update the documentation. Cheers, Fabian Am Mi., 8. Mai 2019 um 17:48 Uhr schrieb Flavio Pompermaier < pomperma...@okkam.it>: > Great, thanks Till! > > On Wed, May 8, 2019 at 4:20 PM Till Rohrmann wrote: > >> Hi Flavio, >> >> taskmanager.tmp.dirs is the deprecated

Re: Timestamp and key preservation over operators

2019-05-03 Thread Fabian Hueske
The window operator cannot configured to use the max timestamp of the events in the window as the timestamp of the output record. The reason is that such a behavior can produce late records. If you want to do that, you have to track the max timestamp and assign it yourself with a timestamp assigne

Re: Filter push-down not working for a custom BatchTableSource

2019-05-03 Thread Fabian Hueske
@Override > public DataSet getDataSet(ExecutionEnvironment execEnv) { > return execEnv.fromCollection(resourceIterator, modelClass); > } > > @Override > public TypeInformation getReturnType() { > return TypeInformation.of(modelClass); > } > > @Overri

<    1   2   3   4   5   6   7   8   9   10   >