Re: Broadcast state

2019-09-30 Thread Navneeth Krishnan
Hi, I can use redis but I’m still having hard time figuring out how I can eliminate duplicate data. Today without broadcast state in 1.4 I’m using cache to lazy load the data. I thought the broadcast state will be similar to that of kafka streams where I have read access to the state across the pi

Re: Increasing trend for state size of keyed stream using ProcessWindowFunction with ProcessingTimeSessionWindows

2019-09-30 Thread Congxian Qiu
Hi Oliwer, >From the description, Seems the state didn't be cleared, maybe you could check how many {{windowState.clear()}} was triggered in {{WindowOperator#processElement}}, and try to figure it out why the state did not be cleared. Best, Congxian Oliwer Kostera 于2019年9月27日周五 下午4:14写道: > Hi

Re: Broadcast state

2019-09-30 Thread Congxian Qiu
Hi, Could you use some cache system such as HBase or Reids to storage this data, and query from the cache if needed? Best, Congxian Navneeth Krishnan 于2019年10月1日周二 上午10:15写道: > Thanks Oytun. The problem with doing that is the same data will be have to > be stored multiple times wasting memory

Re: Flink Join Time Window

2019-09-30 Thread Rong Rong
Hi Nishant, On a brief look. I think this is a problem with your 2nd query: > > *Table2*... > Table latestBadIps = tableEnv.sqlQuery("SELECT MAX(b_proctime) AS > mb_proctime, bad_ip FROM BadIP ***GROUP BY bad_ip***HAVING > MIN(b_proctime) > CURRENT_TIMESTAMP - INTERVAL '2' DAY "); > tableEnv.regi

Fencing token exceptions from Job Manager High Availability mode

2019-09-30 Thread Hanson, Bruce
Hi all, We are running some of our Flink jobs with Job Manager High Availability. Occasionally we get a cluster that comes up improperly and doesn’t respond. Attempts to submit the job seem to hang and when we hit the /overview REST endpoint in the Job Manager we get a 500 error and a fencing t

[SURVEY] What is the most subtle/hard to catch bug that people have seen?

2019-09-30 Thread Konstantinos Kallas
Hi everyone. I wanted to ask Flink users what are the most subtle Flink bugs that people have witnessed. The cause of the bugs could be anything (e.g. wrong assumptions on data, parallelism of non-parallel operator, simple mistakes). We are developing a testing framework for Flink and it would

Re: Broadcast state

2019-09-30 Thread Navneeth Krishnan
Thanks Oytun. The problem with doing that is the same data will be have to be stored multiple times wasting memory. In my case there will around million entries which needs to be used by at least two operators for now. Thanks On Mon, Sep 30, 2019 at 5:42 PM Oytun Tez wrote: > This is how we cur

Re: Broadcast state

2019-09-30 Thread Oytun Tez
This is how we currently use broadcast state. Our states are re-usable (code-wise), every operator that wants to consume basically keeps the same descriptor state locally by processBroadcastElement'ing into a local state. I am open to suggestions. I see this as a hard drawback of dataflow programm

Re: Broadcast state

2019-09-30 Thread Oytun Tez
You can re-use the broadcasted state (along with its descriptor) that comes into your KeyedBroadcastProcessFunction, in another operator downstream. that's basically duplicating the broadcasted state whichever operator you want to use, every time. --- Oytun Tez *M O T A W O R D* The World's Fas

Broadcast state

2019-09-30 Thread Navneeth Krishnan
Hi All, Is it possible to access a broadcast state across the pipeline? For example, say I have a KeyedBroadcastProcessFunction which adds the incoming data to state and I have downstream operator where I need the same state as well, would I be able to just read the broadcast state with a readonly

Re: Best way to link static data to event data?

2019-09-30 Thread John Smith
Hi so here is what I have done... 1- I load my CSV using CSV table source. 2- 1 setup Kafka stream to read my incoming events. 3- Map my events to a POJO 4- Join the 2 tables 5- Push the joined result to Elastic search. This works absolutely fine. So whats the difference between this and the prop

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-30 Thread Sean Hester
Vijay, That is my understanding as well: the HA solution only solves the problem up to the point all job managers fail/restart at the same time. That's where my original concern was. But to Aleksandar and Yun's point, running in HA with 2 or 3 Job Managers per cluster--as long as they are all dep

Re: Best way to link static data to event data?

2019-09-30 Thread John Smith
Ok thanks. It's basically telephone area codes, they barely ever change. On Mon, 30 Sep 2019 at 06:03, Gaël Renoux wrote: > Hi John, > > I've had a similar requirement, and I've resorted to simply use a static > cache (I'm coding in Scala, so that's a lazy value on a singleton object - > in Java

Re: Problems with java.utils

2019-09-30 Thread Zili Chen
Thanks for your information Dian. It is not an urgent issue though. Maybe revisit later :-) Best, tison. Dian Fu 于2019年9月30日周一 下午7:34写道: > Hi tison, > > Actually there may be compatibility issues as the > BatchTableEnvironment/StreamTableEnvironment under "api.java" are public > interfaces. >

Re: Problems with java.utils

2019-09-30 Thread Dian Fu
Hi tison, Actually there may be compatibility issues as the BatchTableEnvironment/StreamTableEnvironment under "api.java" are public interfaces. Regards, Dian > 在 2019年9月30日,下午4:49,Zili Chen 写道: > > Hi Dian, > > What about rename api.java to japi if there is no unexpected compatibility > i

Re: Best way to link static data to event data?

2019-09-30 Thread Gaël Renoux
Hi John, I've had a similar requirement, and I've resorted to simply use a static cache (I'm coding in Scala, so that's a lazy value on a singleton object - in Java that would be a static value on some utility class, with a synchronized lazy-loading getter). The value is reloaded after some durati

Flink Join Time Window

2019-09-30 Thread Nishant Gupta
Hi Team, I am trying to Join [kafka stream] and [badip stream grouped with badip] Can someone please help me out with verifying what is wrong in highlighted query. Am I writing the time window join query wrong with this use case.? Or it is a bug and i should report this what is the work around, i

Re: Problems with java.utils

2019-09-30 Thread Zili Chen
Hi Dian, What about rename api.java to japi if there is no unexpected compatibility issue? I think we can always avoid use a `.java.` in package names. Best, tison. Dian Fu 于2019年9月26日周四 下午10:54写道: > Hi Nick, > > There is a package named "org.apache.flink.table.api.java" in flink and > so the