Re: Fwd: Complex graph-based sessionization (potential use for stateful functions)

2020-04-07 Thread m@xi
Hello Robert Thanks to your reply I discovered the Stateful Functions which I believe is a quite powerful tool. I have some questions: 1) As you said, "the community is currently in the process of releasing the first Apache release of StateFun and it should hopefully be out by the end of this wee

Re: Pulsar as a state backend

2020-04-07 Thread Markos Sfikas
Hi Michael, Thanks for the question regarding the Pulsar Connector. The team is currently focusing on further developing the Flink-Pulsar connector with the following items: - Addition of a columnar off-loader to improve the performance for batch queries.The columnar segment off-loader is

fink sql client not able to read parquet format table

2020-04-07 Thread wangl...@geekplus.com.cn
Hive table stored as parquet. Under hive client: hive> select robotid from robotparquet limit 2; OK 1291097 1291044 But under flink sql-client the result is 0 Flink SQL> select robotid from robotparquet limit 2; robotid 0 0

Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-07 Thread Shachar Carmeli
We are using Flink 1.6.3 and keeping the checkpoint in CEPH ,retaining only one checkpoint at a time , using incremental and using rocksdb. We run windows with lateness of 3 days , which means that we expect that no data in the checkpoint share folder will be kept after 3-4 days ,Still We see t

[ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.0.0. Stateful Functions is an API that simplifies building distributed stateful applications. It's based on functions with persistent state that can interact dynamically with strong consistency gu

Re: State Processor API with Beam

2020-04-07 Thread Stephen Patel
Thanks Seth, I'll look into rolling my own KeyedStateInputFormat. On Mon, Apr 6, 2020 at 2:50 PM Seth Wiesman wrote: > Hi Stephen, > > You will need to implement a custom operator and user the `transform` > method. It's not just that you need to specify the namespace type but you > will also nee

Re: Pulsar as a state backend

2020-04-07 Thread Michael Colson
Hello Markos, Thanks for the quick answer and the good news. I'll follow the progress and the FLIP-72. Thanks, Le mar. 7 avr. 2020 à 03:50, Markos Sfikas a écrit : > Hi Michael, > > Thanks for the question regarding the Pulsar Connector. > > The team is currently focusing on further developing

Re: Creating singleton objects per task manager

2020-04-07 Thread Salva Alcántara
Thanks a lot Seth! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Creating singleton objects per task manager

2020-04-07 Thread KristoffSC
Hi Seth, I would like to piggyback on this question :) You wrote: "I would strongly encourage you to create one instance of your object per ProcessFunction, inside of open. That would be one instance per slot which is not equal to the parallelism of your operator." Especially the second part "Tha

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Hequn Cheng
Thanks a lot for the release and your great job, Gordon! Also thanks to everyone who made this release possible! Best, Hequn On Tue, Apr 7, 2020 at 8:58 PM Tzu-Li (Gordon) Tai wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Stateful Functions 2.0.0. >

Re: Dynamic Flink SQL

2020-04-07 Thread Krzysztof Zarzycki
Hi Maciej, thanks for joining. I answer your comments below. > > the idea is quite interesting - although maintaining some coordination to > be able to handle checkpoints would probably pretty tricky. Did you figure > out how to handle proper distribution of tasks between TMs? As far as I > unders

TCP streams to multiple clients

2020-04-07 Thread Simec, Nick
I'd like to put flink on a proxy server to read in a stream from an external source and then distribute that stream to multiple servers. Is that possible with Flink? Would I have to replicate the data or anything? I want source -> proxy-server -> (FLINK) -> -> -> -> -> 5 servers Flink reads th

Re: flink 1.9 conflict jackson version

2020-04-07 Thread Fanbin Bu
Hi Aj, I got a work around to put my app jar inside /usr/lib/flink/lib directory. On Mon, Apr 6, 2020 at 11:27 PM aj wrote: > Hi Fanbin, > > I am facing a similar kind of issue. Let me know if you are able to > resolve this issue then please help me also > > > https://stackoverflow.com/question

Re: Creating singleton objects per task manager

2020-04-07 Thread Seth Wiesman
Hi Kristoff, You are correct that, that was a typo :) At most one instance per slot. Seth > On Apr 7, 2020, at 9:41 AM, KristoffSC wrote: > > Hi Seth, > I would like to piggyback on this question :) > > You wrote: > "I would strongly encourage you to create one instance of your object per

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Marta Paes Moreira
Thank you for managing the release, Gordon — you did a tremendous job! And to everyone else who worked on pushing it through. Really excited about the new use cases that StateFun 2.0 unlocks for Flink users and beyond! Marta On Tue, Apr 7, 2020 at 4:47 PM Hequn Cheng wrote: > Thanks a lot for

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Oytun Tez
Great news! Thank you all. On Tue, Apr 7, 2020 at 12:23 PM Marta Paes Moreira wrote: > Thank you for managing the release, Gordon — you did a tremendous job! And > to everyone else who worked on pushing it through. > > Really excited about the new use cases that StateFun 2.0 unlocks for Flink >

RE: Anomaly detection Apache Flink

2020-04-07 Thread Nienhuis, Ryan
Vigo, I mean that the algorithm is a standalone piece of code. There are no examples that I am aware of for running it using Flink. Ryan From: Salvador Vigo Sent: Saturday, April 4, 2020 12:26 AM To: Marta Paes Moreira Cc: Nienhuis, Ryan ; user Subject: RE: [EXTERNAL] Anomaly detection Apach

Re: Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-07 Thread Yun Tang
Hi Shachar Why do we see data that is older from lateness configuration There might existed three reasons: 1. RocksDB really still need that file in current checkpoint. If we upload one file named as 42.sst at 2/4 at some old checkpoint, current checkpoint could still include that 42.sst fil

State size Vs keys number perfromance

2020-04-07 Thread KristoffSC
Hi, I would to ask about what has more memory footprint and what could be more efficient regarding less keys with bigger keyState vs many keys with smaller keyState For this use case I'm using RocksDB StateBackend and state TTL is, well.. infinitive. So I'm keeping the state forever in Flink. Th

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Oytun Tez
I should also add, I couldn't agree more with this sentence in the release article: "state access/updates and messaging need to be integrated." This is something we strictly enforce in our Flink case, where we do not refer to anything external for storage, use Flink as our DB. -- [image: Mota

Re: Anomaly detection Apache Flink

2020-04-07 Thread Salvador Vigo
Ok, thanks for the clarification. On Tue, Apr 7, 2020, 7:00 PM Nienhuis, Ryan wrote: > Vigo, > > > > I mean that the algorithm is a standalone piece of code. There are no > examples that I am aware of for running it using Flink. > > > > Ryan > > > > *From:* Salvador Vigo > *Sent:* Saturday, Ap

Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-07 Thread Lu Niu
Hi, flink users Did anyone encounter such error? The error comes from S3AFileSystem. But there is no capacity issue on any disk. we are using hadoop 2.7.1. ``` Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not open output stream for state backend at java.u

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Congxian Qiu
Thanks a lot for the release and your great job, Gordon! Also thanks to everyone who made this release possible! Best, Congxian Oytun Tez 于2020年4月8日周三 上午2:55写道: > I should also add, I couldn't agree more with this sentence in the release > article: "state access/updates and messaging need to b

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-07 Thread Congxian Qiu
Hi >From the stack, seems the problem is that "org.apache.flink.fs.shaded. hadoop3.org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for s3ablock-0001-", and I googled the exception, found there is some relative page[1], could you please make sure there

Re: State size Vs keys number perfromance

2020-04-07 Thread Congxian Qiu
Hi I'll give some information from my side: 1. The performance for RocksDB is mainly related to the (de)serialization and disk read/write. 2. MapState just need to (de)serialize the single mapkey/mapvalue when read/write state, ValueState need to (de)serialize the whole state when read/write the st

Re: New kafka producer on each checkpoint

2020-04-07 Thread Yun Tang
Hi Maxim If you use the EXACTLY_ONCE semantic (instead of AT_LEAST_ONCE or NONE) for flink kafka producer. It will create new producer when every new checkpoint comes [1]. This is by design and from my point of view, the checkpoint interval of 10 seconds might be a bit too often. In general I t

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-07 Thread Lu Niu
Hi, Congxiao Thanks for replying. yeah, I also found those references. However, as I mentioned in original post, there is enough capacity in all disk. Also, when I switch to presto file system, the problem goes away. Wondering whether others encounter similar issue. Best Lu On Tue, Apr 7, 2020 a

Re: Using MapState clear, put methods in snapshotState within KeyedCoProcessFunction, valid or not?

2020-04-07 Thread Yun Tang
HI Salva Sorry for missing your recent reply. If you just want to make the models could be recoverable, you should choose operator state to store the "models". If you stick to the keyed state, I cannot see why these models are related to current processing key. As you can see, the "models" is j

Re: Making job fail on Checkpoint Expired?

2020-04-07 Thread Congxian Qiu
Hi Robin Thanks for the detailed reply, and sorry for my late reply. I think that your request to fail the whole job when continues checkpoint expired is valid, I've created an issue to track this[1] For now, maybe the following steps can help you find out the reason of time out 1. You can find o

Re: Using MapState clear, put methods in snapshotState within KeyedCoProcessFunction, valid or not?

2020-04-07 Thread Salva Alcántara
Hi Yun, thanks for your reply. I agree with you, I will switch to operator state. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Till Rohrmann
Great news! Thanks a lot for being our release manager Gordon and to everyone who helped with the release. Cheers, Till On Wed, Apr 8, 2020 at 3:57 AM Congxian Qiu wrote: > Thanks a lot for the release and your great job, Gordon! > Also thanks to everyone who made this release possible! > > Bes