Re: POJO ERROR

2019-12-19 Thread Alexandru Vasiu
Hi, We fixed it by converting the case class to a class. Thank you, Alex On Thu, Dec 19, 2019 at 5:43 PM Timo Walther wrote: > Sorry, you are right. Maybe you can also share the full stack trace > because I don't know where this guava library should be used. > > Regards, > Timo > > > On 19.12.

Re: java.lang.NoClassDefFoundError due to Native library issue?

2019-12-19 Thread Ruidong Li
I've come across a similar issue before, the reason is that for a dynamic link library(so/dll), the there can only be one classloader to load it. When restart/failover happens in flink, the JVM will not exit but only create a new classloader which leads to multiple loading to the same so/dll, here

java.lang.NoClassDefFoundError due to Native library issue?

2019-12-19 Thread Hegde, Mahendra
Hello Team, I am trying to use timezone finding service (https://github.com/dustin-johnson/timezonemap) in my Flink java job. It worked fine in local machine and it worked fine initially in Flink server, but after 2-3 restart of the job it started giving NoClassDefFoundError error- java.lang.N

Re: Change Akka Ask Timeout for Job Submission Only

2019-12-19 Thread tison
Forward to user list. Best, tison. Abdul Qadeer 于2019年12月20日周五 下午12:57写道: > Around submission time, logs from jobmanager: > > {"timeMillis":1576764854245,"thread":"flink-akka.actor.default-dispatcher-1016","level":"INFO","loggerName":"org.apache.flink.runtime.dispatcher.StandaloneDispatcher","

Re: Change Akka Ask Timeout for Job Submission Only

2019-12-19 Thread tison
IIRC this issue is possibly caused by resource limited or some occasional reasons. Ever I heard that someone upgrade Java version and the issue vanished. For "akka.ask.timeout", it is used for all akka ask requests timeout. And I second Yang that the timeout is irrelevant with client-server connec

Re: Change Akka Ask Timeout for Job Submission Only

2019-12-19 Thread Yang Wang
It seems that not because the timeout of rest client. It is a server side akka timeout exception. Could you share the jobmanager logs? Best, Yang Abdul Qadeer 于2019年12月20日周五 上午10:59写道: > The relevant config here is "akka.ask.timeout". > > On Thu, Dec 19, 2019 at 6:51 PM tison wrote: > >> In pr

Re: Change Akka Ask Timeout for Job Submission Only

2019-12-19 Thread Abdul Qadeer
The relevant config here is "akka.ask.timeout". On Thu, Dec 19, 2019 at 6:51 PM tison wrote: > In previous version there is an "akka.client.timeout" option but it is > only used for timeout the future in client side so I don't think it change > akka scope timeout. > > Best, > tison. > > > Abdul

Re: Flink Prometheus metric doubt

2019-12-19 Thread vino yang
Hi Jesus, IMHO, maybe @Chesnay Schepler can provide more information. Best, Vino Jesús Vásquez 于2019年12月19日周四 下午6:57写道: > Hi all, i'm monitoring Flink jobs using prometheus. > I have been trying to use the metrics flink_jobmanager_job_uptime/downtime > in order to create an alert, that fires

Re: Unit testing filter function in flink

2019-12-19 Thread vino yang
Hi Vishwas, Apache Flink provides some test harness to test your application code on multiple levels of the testing pyramid. You can use them to test your UDF. Please see more examples offered by the official documentation[1]. Best, Vino [1]: https://ci.apache.org/projects/flink/flink-docs-stab

Re: Change Akka Ask Timeout for Job Submission Only

2019-12-19 Thread tison
In previous version there is an "akka.client.timeout" option but it is only used for timeout the future in client side so I don't think it change akka scope timeout. Best, tison. Abdul Qadeer 于2019年12月20日周五 上午10:44写道: > Hi! > > I am using Flink 1.8.3 and facing an issue where job submission th

Re: Need guidance on a use case

2019-12-19 Thread Jark Wu
Hi Eva, If I understand correctly, 1) the user stream is a changelog stream which every record is a upsert with a primary key, and you only want to join the latest one 2) if the user record is updated, you want to re-trigger the join (retract&update previous joined result) If this is your require

Change Akka Ask Timeout for Job Submission Only

2019-12-19 Thread Abdul Qadeer
Hi! I am using Flink 1.8.3 and facing an issue where job submission through RestClusterClient times out on Akka (default value 10s). In previous Flink versions there was an option to set a different timeout value just for the submission client (ClusterClient config), but looks like it is not honor

Re: Need guidance on a use case

2019-12-19 Thread Kurt Young
Hi Eva, Correct me If i'm wrong. You have an unbounded Task stream and you want to enrich the User info to the task event. Meanwhile, the User table is also changing by the time, so you basically want that when task event comes, join the latest data of User table and emit the results. Even if the

Re: Querying DataStream for events before a given time

2019-12-19 Thread Cindy McMullen
Never mind. Flink docs state that the query is an append, not an update, so the query is working as expected. https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/dynamic_tables.html#continuous-queries

Re: Triggering temporal queries

2019-12-19 Thread Cindy McMullen
Never mind. The code is correct; the input test data was not. All is well. FWIW, it’s useful while debugging to select the results of the time function itself: String query = "SELECT lastTry, LOCALTIMESTAMP, TIMESTAMPDIFF(MINUTE, lastTry, LOCALTIMESTAMP) from " + rawTable + " WHERE (TIMES

Unit testing filter function in flink

2019-12-19 Thread Vishwas Siravara
Hi guys, I want to test a function like : private[flink] def filterStream(dataStream: DataStream[GenericRecord]): DataStream[GenericRecord] = { dataStream.filter(new FilterFunction[GenericRecord] { override def filter(value: GenericRecord): Boolean = { if (value == null || value.get(St

Triggering temporal queries

2019-12-19 Thread Cindy McMullen
This code runs and returns the correct result on the initial query, but fails to trigger as data continues to stream in on Kafka. Is there anything obvious I’m missing? env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime); tableEnv = StreamTableEnvironment.create(env); // Consume

Re: Deprecated SplitStream class - what should be use instead.

2019-12-19 Thread KristoffSC
Kostas, thank you for your response, Well although the Side Outputs would do the job, I was just surprised that those are the replacements for stream splitting. The thing is, and this is might be only a subjective opinion, it that I would assume that Side Outputs should be used only to produce so

Re: Querying DataStream for events before a given time

2019-12-19 Thread Cindy McMullen
This is close: String query = "SELECT pid, status, lastTry " + " FROM " + rawTable + " WHERE status='RECOVERABLE'" + " GROUP BY HOP(UserActionTime, INTERVAL '30' SECOND, INTERVAL '5' HOUR), pid, status, lastTry"; But I need to have a stream/table that will dynamically update every 30

Re: Deprecated SplitStream class - what should be use instead.

2019-12-19 Thread Kostas Kloudas
Hi Kristoff, The recommended alternative is to use SideOutputs as described in [1]. Could you elaborate why you think side outputs are not a good choice for your usecase? Cheers, Kostas [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html On Thu, Dec 19, 2019

RE: Keyed stream, parallelism, load balancing and ensuring that the same key go to the same Task Manager and task slot

2019-12-19 Thread Theo Diefenthal
Hi Krzysztof, You can just key your stream by transaction id. If you have lots of different transaction ids, you can expect the load to be evenly distributed. All events with the same key (==transaction id) will be processed by the same task slot. If you only have a few kafka partitions, you coul

Re: [DISCUSS] What parts of the Python API should we focus on next ?

2019-12-19 Thread Bowen Li
- integrate PyFlink with Jupyter notebook - Description: users should be able to run PyFlink seamlessly in Jupyter - Benefits: Jupyter is the industrial standard notebook for data scientists. I’ve talked to a few companies in North America, they think Jupyter is the #1 way to empower internal

Re: Need guidance on a use case

2019-12-19 Thread Timo Walther
Hi Eva, I'm not 100% sure if your use case can be solved with SQL. JOIN in SQL always joins an incoming record with all previous arrived records. Maybe Jark in CC has some idea? It might make sense to use the DataStream API instead with a connect() and CoProcessFunction where you can simply

Querying DataStream for events before a given time

2019-12-19 Thread Cindy McMullen
Hi - I’m streaming events from Kafka, processing in EventTime. I’d like to process only events that are older (before) some given time (say, 2 days ago) at an interval of 5 minutes. I’ve experimented with Flink DynamicTables: String query = "SELECT pid, status, lastTry, TUMBLE_END(UserActionT

Deprecated SplitStream class - what should be use instead.

2019-12-19 Thread KristoffSC
Hi, I've noticed that SplitStream class is marked as deprecated, although split method of DataStream is not. Also there is no alternative proposed in SplitStream doc for it. In my use case I will have a stream of events that I have to split into two separate streams based on some function. Events

Re: POJO ERROR

2019-12-19 Thread Timo Walther
Sorry, you are right. Maybe you can also share the full stack trace because I don't know where this guava library should be used. Regards, Timo On 19.12.19 14:50, Alexandru Vasiu wrote: Nope, because scalaBuildVersion is the scala version including minor version so in this case: 2.12.10 and w

Re: How to convert retract stream to dynamic table?

2019-12-19 Thread David Anderson
The Elasticsearch, HBase, and JDBC[1] table sinks all support streaming UPSERT mode[2]. While not exactly the same as RETRACT mode, it seems like this might get the job done (unless I'm missing something, which is entirely possible). David [1] https://ci.apache.org/projects/flink/flink-docs-relea

Re: Fw: Metrics based on data filtered from DataStreamSource

2019-12-19 Thread Robert Metzger
Hey Sidney, for the .filter() function, you are passing a function without an open() method. The function that getFilter() returns, has no open() method. What could work is creating a Handler extends AbstractRichFunction implements MapFunction, FilterFunction and passing those instances into the

Re: POJO ERROR

2019-12-19 Thread Alexandru Vasiu
Nope, because scalaBuildVersion is the scala version including minor version so in this case: 2.12.10 and we used it just where we need. We used scalaVersion to specify for each library what scala is used, so used flink will be flink-streaming-scala_2.12 Alex On Thu, Dec 19, 2019 at 3:40 PM Timo

Re: POJO ERROR

2019-12-19 Thread Timo Walther
I see a mismatch between scalaBuildVersion and scalaVersion could this be the issue? Regards, Timo On 19.12.19 14:33, Alexandru Vasiu wrote: This is a part of my Gradle config: ext {     scalaVersion = '2.12'     flinkVersion = '1.9.1'     scalaBuildVersion = "${scalaVersion}.10"     sca

Re: POJO ERROR

2019-12-19 Thread Alexandru Vasiu
This is a part of my Gradle config: ext { scalaVersion = '2.12' flinkVersion = '1.9.1' scalaBuildVersion = "${scalaVersion}.10" scalaMockVersion = '4.4.0' circeGenericVersion = '0.12.3' circeExtrasVersion = '0.12.2' pardiseVersion = '2.1.1' slf4jVersion = '1.7.7'

Re: Processing post to sink?

2019-12-19 Thread Robert Metzger
Hey Theo, your solution of turning the sink into a process function should work. I'm just not sure how easy it is to re-use the StreamingFileSink inside it. Have you considered sending all the records to a parallelism=1 process function sitting "next" to the StreamingFileSink. You could track the

Re: POJO ERROR

2019-12-19 Thread Timo Walther
That's sounds like a classloading or most likely dependency issue. Are all dependencies including Flink use the same Scala version? Could you maybe share reproducible some code with us? Regards, Timo On 19.12.19 13:53, Alexandru Vasiu wrote: I'm sorry for my last message, it might be incomp

Re: POJO ERROR

2019-12-19 Thread Alexandru Vasiu
I'm sorry for my last message, it might be incomplete. So I used case classed for my objects, but it doesn't work. Riching this error: "Exception in thread "main" org.apache.flink.shaded.guava18.com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: scala/math/Ordering$

回复: DataStream API min max aggregation on other fields

2019-12-19 Thread Lu Weizheng
Yes, the unpredictable non-key and non-aggregated fields make me confused. As Biao said, It is because the purged window state. So when I want to use max or min, I should only use aggregated field. Other fields are not defined, I should take care not use them. Thank you guys for your replies! __

Re: POJO ERROR

2019-12-19 Thread Alexandru Vasiu
I used `case class` for example case class A(a: Map[String, String]) so it should work Alex On Thu, Dec 19, 2019 at 2:18 PM Timo Walther wrote: > Hi Alex, > > the problem is that `case class` classes are analyzed by Scala specific > code whereas `class` classes are analyzed with Java specific c

Re: Kafka table descriptor missing startFromTimestamp()

2019-12-19 Thread Timo Walther
This issue is work in progress: https://issues.apache.org/jira/browse/FLINK-15220 On 19.12.19 09:07, Dawid Wysakowicz wrote: Hi, The only reason why it was not exposed at the beginning is that not all versions of the consumers support starting from a specific timestamp. I think we could expos

Re: POJO ERROR

2019-12-19 Thread Timo Walther
Hi Alex, the problem is that `case class` classes are analyzed by Scala specific code whereas `class` classes are analyzed with Java specific code. So I would recommend to use a case class to make sure you stay in the "Scala world" otherwise the fallback is the Java-based TypeExtractor. For

Re: Can trigger fire early brefore specific element get into ProcessingTimeSessionWindow

2019-12-19 Thread vino yang
Hi Utopia, Flink provides a high scalability window mechanism.[1] For your scene, you can customize your window assigner and trigger. [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html Best, Vino Utopia 于2019年12月19日周四 下午5:56写道: > Hi, > > I want to f

POJO ERROR

2019-12-19 Thread Alexandru Vasiu
Hi, I use flink-scala version 1.9.1 and scala 2.12.10, and I defined a data type which is a bit more complex: it has a list in it and even a dictionary. When I try to use a custom map I got this error: INFO org.apache.flink.api.java.typeutils.TypeExtractor - class A does not contain a setter fo

Re: Apache Flink - Flink Metrics - How to distinguish b/w metrics for two job manager on the same host

2019-12-19 Thread M Singh
Thanks Vino and Biao for your help.  Mans On Thursday, December 19, 2019, 02:25:40 AM EST, Biao Liu wrote: Hi Mans, That's indeed a problem. We have a plan to fix it. I think it could be included in 1.11. You could follow this issue [1] to check the progress. [1] https://issues.apache

Flink Prometheus metric doubt

2019-12-19 Thread Jesús Vásquez
Hi all, i'm monitoring Flink jobs using prometheus. I have been trying to use the metrics flink_jobmanager_job_uptime/downtime in order to create an alert, that fires when one of this values emits -1 since the doc says this is the behavior of the metric when the job gets to a completed state. The t

Re: [DISCUSS] Drop vendor specific repositories from pom.xml

2019-12-19 Thread Till Rohrmann
The profiles make bumping ZooKeeper's version a bit more cumbersome. I would be interested for this reason to get rid of them, too. Cheers, Till On Wed, Dec 18, 2019 at 5:35 PM Robert Metzger wrote: > I guess we are talking about this profile [1] in the pom.xml? > > +1 to remove. > > I'm not su

Re: Keyed stream, parallelism, load balancing and ensuring that the same key go to the same Task Manager and task slot

2019-12-19 Thread KristoffSC
Hi :) Any thoughts about this? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: DataStream API min max aggregation on other fields

2019-12-19 Thread Biao Liu
Hi Lu, @vino yang I think what he means is that the "max" semantics between window and non-window are different. It changes non-aggregated fields unpredictably. That's really an interesting question. I take a look at the relevant implementation. From the perspective of codes, "max" always keeps

Can trigger fire early brefore specific element get into ProcessingTimeSessionWindow

2019-12-19 Thread Utopia
Hi, I want to fire and evaluate the ProcessingTimeSessionWindow when a specific element come into current window. But I want to exclude the specific element when processing window function and remaining it for the next evaluation. Thanks Best  regards Utopia

Re: DataStream API min max aggregation on other fields

2019-12-19 Thread vino yang
Hi weizheng, IMHO, I do not know where is not clear to you? Is the result not correct? Can you share the correct result based on your understanding? The "keyBy" specifies group field and min/max do the aggregation in the other field based on the position you specified. Best, Vino Lu Weizheng 于

DataStream API min max aggregation on other fields

2019-12-19 Thread Lu Weizheng
Hi all, On a KeyedStream, when I use maxBy or minBy, I will get the max or min element. It means other fields will be kept as the max or min element. This is quite clear. However, when I use max or min, how do Flink do on other fields? val tupleStream = senv.fromElements( (0, 0, 0), (0, 1, 1

Re: Restore metrics on broadcast state after restart

2019-12-19 Thread Gaël Renoux
Thanks, that's exactly what I needed! On Wed, Dec 18, 2019 at 5:44 PM Yun Tang wrote: > Hi Gaël > > You can try initializeState [1] to initialize your metrics values from > states when restoring from a checkpoint. > > context.getOperatorStateStore().getBroadcastState() could visit your > restor

Re: [Question] How to use different filesystem between checkpointdata and user data sink

2019-12-19 Thread Piotr Nowojski
Hi, Can you share the full stack trace or just attach job manager and task managers logs? This exception should have had some cause logged below. Piotrek > On 19 Dec 2019, at 04:06, ouywl wrote: > > Hi Piotr Nowojski, >I have move my filesystem plugin to FLINK_HOME/pulgins in flink 1.9.1.

Re: How to convert retract stream to dynamic table?

2019-12-19 Thread Dawid Wysakowicz
Hi, Correct me if I am wrong James, but I think your original question was how do you create a Table out of a changelog (a stream with a change flag).  Unfortunately I think it is not possible right now. This definitely is high on our priority list for the near future. There were first approaches[

Re: Kafka table descriptor missing startFromTimestamp()

2019-12-19 Thread Dawid Wysakowicz
Hi, The only reason why it was not exposed at the beginning is that not all versions of the consumers support starting from a specific timestamp. I think we could expose such setting now. Would you like to create an issue for it? Best, Dawid On 19/12/2019 06:56, Steve Whelan wrote: > Examining