can we do Flink CEP on event stream or batch or both?

2019-04-29 Thread kant kodali
Hi All, I have the following questions. 1) can we do Flink CEP on event stream or batch? 2) If we can do streaming I wonder how long can we keep the stream stateful? I also wonder if anyone successfully had done any stateful streaming for days or months(with or without CEP)? or is stateful

Re: How to verify what maxParallelism is set to?

2019-04-29 Thread Sean Bollin
Thanks! Do you know if it's possible somehow to verify the global maxParallelism other than calling .getMaxParallelism? Either through an API call or the UI? On Mon, Apr 29, 2019 at 8:12 PM Guowei Ma wrote: > > Hi, > StreamExecutionEnvironment is used to set a default maxParallelism for >

Re: How to verify what maxParallelism is set to?

2019-04-29 Thread Guowei Ma
Hi, StreamExecutionEnvironment is used to set a default maxParallelism for global. If a "operator"'s maxParallelism is -1 the operator will be set the maxParallelism which is set by StreamExecutionEnvironment. >>>Any API or way I can verify? I can't find any easy way to do that. But you could use

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
Sorry. I sent an empty reply. I tried again with getter/setter. And it works. Thanks. — import lombok.Getter; import lombok.Setter; import java.io.Serializable; @Getter @Setter public class P implements Serializable { private String name; private Integer value; } — > On 29

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
> On 29 Apr 2019, at 11:12 PM, Timo Walther wrote: > > Hi Sung, > > private fields are only supported if you specify getters and setters > accordingly. Otherwise you need to use `Row.class` and perform the mapping in > a subsequent map() function manually via reflection. > > Regards, >

Timestamp and key preservation over operators

2019-04-29 Thread Averell
Hello, I extracted timestamps using BoundedOutOfOrdernessTimestampExtractor from my sources, have a WindowFunction, and found that my timestamps has been lost. To do another Window operation, I need to extract timestamp again. I tried to find a document for that but haven't found one. Could you

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-29 Thread M Singh
Hi An0: Here is my understanding - each operator has the watermark which is the lowest of all it's input streams. When the watermark for an operator is updated, the lowest one becomes the new watermark for that operator and is fowarded to the output streams for that operator.  So, if one of

Re: Flink session window not progressing

2019-04-29 Thread Konstantin Knauf
Hi Henrik, yes, the output count of a sink (and the input count of sources) is always zero, because only Flink internal traffic is reflected in these metrics. There is a Jira issue to change this [1]. Cheers, Konstantin [1] https://issues.apache.org/jira/browse/FLINK-7286 On Mon, Apr 29,

Flink heap memory

2019-04-29 Thread Rad Rad
Hi, I would like to know the amount of heap memory currently used (in bytes) of a specific job which runs on Flink cluster. Regards. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-29 Thread an0
Thanks very much. It definitely explains the problem I'm seeing. However, something I need to confirm: You say "Watermarks are broadcasted/forwarded anyway." Do you mean, in assingTimestampsAndWatermarks.keyBy.window, it doesn't matter what data flows through a specific key's stream, all key

Re: Flink session window not progressing

2019-04-29 Thread Henrik Feldt
Thinking more about this; it might just be me who is reacting to the sink having a zero rate of output. In fact, I have about two gigs of messages left in the queue until it's up to date, so I may just be running a slow calculation (because I've run a batch job to backfill to after stream).

Flink session window not progressing

2019-04-29 Thread Henrik Feldt
Hi guys, I'm going a PoC with Flink and I was wondering if you could help me. I've asked a question here https://stackoverflow.com/questions/55907954/flink-session-window-sink-timestamp-not-progressing with some images. However, in summary my question is this; why doesn't my session window

Flink Load multiple file

2019-04-29 Thread Soheil Pourbafrani
Hi, I want to load multiple file and apply the processing logic on them. After some searches using the following code I can load all the files in the directory named "input" into Flink: TextInputFormat tif = new TextInputFormat(new Path("input")); DataSet raw = env.readFile(tif, "input//"); If

Re: Working around lack of SQL triggers

2019-04-29 Thread deklanw
Hi, Thanks for the reply. I had already almost completely lost hope in using Flink SQL. You have confirmed that. But, like I said, I don't know how to reduce the large amount of boilerplate I foresee this requiring with the DataStream API. Can you help me with that? You mention "parameterizable

Re: Write simple text into hdfs

2019-04-29 Thread Ken Krugler
DataSet.writeAsText(hdfs://) should work. — Ken > On Apr 29, 2019, at 8:00 AM, Hai wrote: > > Hi, > > Could anyone give a simple way to write a DataSet into hdfs using a > simple way? > > I look up the official document, and didn’t find that, am I missing some > thing ? > > Many thanks.

Write simple text into hdfs

2019-04-29 Thread Hai
Hi, Could anyone give a simple way to write a DataSetString into hdfs using a simple way? I look up the official document, and didn’t find that, am I missing some thing ? Many thanks.

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

2019-04-29 Thread Gary Yao
Since there were no objections so far, I will proceed with removing the code [1]. [1] https://issues.apache.org/jira/browse/FLINK-12312 On Wed, Apr 24, 2019 at 1:38 PM Gary Yao wrote: > The idea is to also remove the rescaling code in the JobMaster. This will > make > it easier to remove the

Re: Emitting current state to a sink

2019-04-29 Thread M Singh
Hi Avi: Can you please elaborate (or include an example/code snippet) of how you were able to achieve collecting the keyed states from the processBroadcastElement method using the applyToKeyedState ?  I am trying to understand which collector you used to emit the state since the broadcasted

Re: Read mongo datasource in Flink

2019-04-29 Thread Kenny Gorman
Just a thought, A robust and high performance way to potentially achieve your goals is: Debezium->Kafka->Flink https://debezium.io/docs/connectors/mongodb/ Good robust handling of various topologies, reasonably good scaling properties, good

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Timo Walther
Hi Sung, private fields are only supported if you specify getters and setters accordingly. Otherwise you need to use `Row.class` and perform the mapping in a subsequent map() function manually via reflection. Regards, Timo Am 29.04.19 um 15:44 schrieb Sung Gon Yi: In

Re: Version "Unknown" - Flink 1.7.0

2019-04-29 Thread Vishal Santoshi
#Generated by Git-Commit-Id-Plugin #Wed Apr 03 22:57:42 PDT 2019 git.commit.id.abbrev=4caec0d git.commit.user.email=aljoscha.kret...@gmail.com git.commit.message.full=Commit for release 1.8.0\n git.commit.id=4caec0d4bab497d7f9a8d9fec4680089117593df git.commit.message.short=Commit for release

POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
In https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset , POJO data type is available to

Re: Data Locality in Flink

2019-04-29 Thread Flavio Pompermaier
Thanks Fabian, that's more clear..many times you don't know when to rebalance or not a dataset because it depends on the specific use case and dataset distribution. An automatic way of choosing whether a Dataset could benefit from a rebalance or not could be VERY nice (at least for batch) but I

RE: kafka partitions, data locality

2019-04-29 Thread Smirnov Sergey Vladimirovich (39833)
Hi Stefan, Thnx for clarify! But still it remains an open question for me because we use keyBy method and I did not found any public interface of keys reassignment (smth like partionCustom for DataStream). As I heard, there is some internal mechanism with key groups and mapping key to groups.

Re: Data Locality in Flink

2019-04-29 Thread Fabian Hueske
Hi Flavio, These typos of race conditions are not failure cases, so no exception is thrown. It only means that a single source tasks reads all (or most of the) splits and no splits are left for the other tasks. This can be a problem if a record represents a large amount of IO or an intensive

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working. [1]: https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan Op ma 29 apr. 2019 om

Re: Apache Flink - Question about dynamically changing window end time at run time

2019-04-29 Thread M Singh
Sounds great Fabian.  I was just trying to see if I can use higher level datastream apis.  I appreciate your advice and help.  Mans On Monday, April 29, 2019, 5:41:36 AM EDT, Fabian Hueske wrote: Hi Mans, I don't know if that would work or not. Would need to dig into the source

Re: Data Locality in Flink

2019-04-29 Thread Flavio Pompermaier
Hi Fabian, I wasn't aware that "race-conditions may happen if your splits are very small as the first data source task might rapidly request and process all splits before the other source tasks do their first request". What happens exactly when a race-condition arise? Is this exception internally

Re: Read mongo datasource in Flink

2019-04-29 Thread Flavio Pompermaier
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong? On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager wrote: > For a framework I'm working on, we actually implemented a (basic) Mongo > source

Re: Read mongo datasource in Flink

2019-04-29 Thread Hai
Thanks for your sharing ~ That’s great ! Original Message Sender:Wouter zorgdragerw.d.zorgdra...@tudelft.nl Recipient:hai...@magicsoho.com Cc:useru...@flink.apache.org Date:Monday, Apr 29, 2019 20:05 Subject:Re: Read mongo datasource in Flink For a framework I'm working on, we actually

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. Cheers, Wouter [1]:

Re: Read mongo datasource in Flink

2019-04-29 Thread Hai
Hi, Flavio: That’s good, Thank you. I will try it later ~ Regards Original Message Sender:Flavio pompermaierpomperma...@okkam.it Recipient:hai...@magicsoho.com Cc:useru...@flink.apache.org Date:Monday, Apr 29, 2019 19:56 Subject:Re: Read mongo datasource in Flink I'm not aware of an

Re: Version "Unknown" - Flink 1.7.0

2019-04-29 Thread Vishal Santoshi
Ok, I will check. On Fri, Apr 12, 2019, 4:47 AM Chesnay Schepler wrote: > have you compiled Flink yourself? > > Could you check whether the flink-dist jar contains a > ".version.properties" file in the root directory? > > On 12/04/2019 03:42, Vishal Santoshi wrote: > > Hello ZILI, > I run

Re: Read mongo datasource in Flink

2019-04-29 Thread Flavio Pompermaier
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1]. The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version. Best, Flavio [1]

Re: Emitting current state to a sink

2019-04-29 Thread Fabian Hueske
Nice! Thanks for the confirmation :-) Am Mo., 29. Apr. 2019 um 13:21 Uhr schrieb Avi Levi : > Thanks! Works like a charm :) > > On Mon, Apr 29, 2019 at 12:11 PM Fabian Hueske wrote: > >> *This Message originated outside your organization.* >> -- >> Hi Avi, >> >> I'm

Re: Emitting current state to a sink

2019-04-29 Thread Avi Levi
Thanks! Works like a charm :) On Mon, Apr 29, 2019 at 12:11 PM Fabian Hueske wrote: > *This Message originated outside your organization.* > -- > Hi Avi, > > I'm not sure if you cannot emit data from the keyed state when you > receive a broadcasted message. > The

Re:会话窗口关闭时间的问题

2019-04-29 Thread 邵志鹏
您好,下面是个人理解: 首先,这个问题不是Session窗口的问题,滚动和滑动是一样的。 时间窗口的计算输出是由时间特性确定的,目前 1. 只有processing-time即处理时间(没有水位线处理乱序)能够满足及时输出窗口的结果。 2. 把eventtime的水位线时间戳设置为System.currentTimeMillis()也是可以及时输出窗口的结果,但是只会消费Flink程序启动后接收到的新消息,之前的消息是处理不到的,即便是新的消费者组group.id和earliest也无效【意思就是容错和重播失效,当然还可以再反复验证】。

??????????event time??????????processing time??????????????

2019-04-29 Thread ??????
??flink Streaming-Event time-Overview ??processing time / event time / ingestion time ?? event time ?? Note that sometimes when event time programs are processing live data in real-time, they will use some processing timeoperations in order to

会话窗口关闭时间的问题

2019-04-29 Thread by1507118
各位大神,你们好: 最近有一个问题一直困扰着我:我设置的会话窗口,会在非活动状态10s后结束 窗口,发现它会在下次窗口生成时才发送本窗口处理完的数据,而我想在本次窗口结束 时发送这个数据,应该如何处理?万分感激 // 这里配置了kafka的信息,并进行数据流的输入 env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); FlinkKafkaConsumer010 kafkaSource = new

Re: Data Locality in Flink

2019-04-29 Thread Fabian Hueske
Hi, The method that I described in the SO answer is still implemented in Flink. Flink tries to assign splits to tasks that run on local TMs. However, files are not split per line (this would be horribly inefficient) but in larger chunks depending on the number of subtasks (and in case of HDFS the

Re: Apache Flink - Question about dynamically changing window end time at run time

2019-04-29 Thread Fabian Hueske
Hi Mans, I don't know if that would work or not. Would need to dig into the source code for that. TBH, I would recommend to check if you can implement the logic using a (Keyed-)ProcessFunction. IMO, process functions are a lot easier to reason about than Flink's windowing framework. You can

Re: FileInputFormat that processes files in chronological order

2019-04-29 Thread Fabian Hueske
Hi Sergei, It depends whether you want to process the file with the DataSet (batch) or DataStream (stream) API. Averell's answer was addressing the DataStream API part. The DataSet API does not have any built-in support to distinguish files (or file splits) by folders and process them in order.

Re: Emitting current state to a sink

2019-04-29 Thread Fabian Hueske
Hi Avi, I'm not sure if you cannot emit data from the keyed state when you receive a broadcasted message. The Context parameter of the processBroadcastElement() method in the KeyedBroadcastProcessFunction has the applyToKeyedState() method. The method takes a KeyedStateFunction that is applied

Re: Working around lack of SQL triggers

2019-04-29 Thread Fabian Hueske
Hi, I don't think that (the current state of) Flink SQL is a good fit for your requirements. Each query will be executed as an independent job. So there won't be any sharing of intermediate results. You can do some of this manually if you use the Table API, but even then it won't allow for early

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-04-29 Thread Fabian Hueske
Hi Juan, count() and collect() trigger the execution of a job. Since Flink does not cache intermediate results (yet), all operations from the sink (count()/collect()) to the sources are executed. So in a sense a DataSet is immutable (given that the input of the sources do not change) but

Re:Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment

2019-04-29 Thread 胡逸才
Thanks Tang: Following your prompt, I deleted the useless parameters from the command line and added your parameters to flink-config.xml, which has been successfully implemented on YARN in the JAVA 7 environment. At 2019-04-28 11:54:18, "Yun Tang" wrote: Hi Zhangjun Thanks for your

Re: RocksDB backend with deferred writes?

2019-04-29 Thread Congxian Qiu
Hi, David When you flush data to db, you can reference the serialize logic[1], and store the serialized bytes to RocksDB. [1] 

Re: QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-29 Thread Ufuk Celebi
Actually, I couldn't even find a mention of this flag in the docs here: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/queryable_state.html – Ufuk On Mon, Apr 29, 2019 at 8:45 AM Ufuk Celebi wrote: > I didn't find this as part of the >

Re: QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-29 Thread Ufuk Celebi
I didn't find this as part of the https://flink.apache.org/news/2019/04/09/release-1.8.0.html notes. I think an update to the Important Changes section would be valuable for users upgrading to 1.8 from earlier releases. Also, logging that the library is on the classpath but the feature flag is

Re: Containers are not released after job failed

2019-04-29 Thread liujiangang
Thank you, it is fixed in the new version. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/