Re: Unsubscribe

2020-02-25 Thread Leonard Xu
Hi, Atle Just send email to user-unsubscr...@flink.apache.org If you want to unsubscribe the mail from user , And you can refer[1] for more details. [1]https://flink.apache.org/community.html#mailing-lists

Re: state schema evolution for case classes

2020-02-25 Thread Apoorv Upadhyay
Hi Roman, I have successfully migrated to flink 1.8.2 with the savepoint created by flink 1.6.2. Now I have to modify few case classes due to new requirement I have created a savepoint and when I run the app with modified class from the savepoint it throws error "state not compatible" Previously t

Re: Java implementations of Streaming applications for Flink

2020-02-25 Thread Piper Piper
Very informative. Thank you, Robert! On Tue, Feb 25, 2020 at 5:02 AM Robert Metzger wrote: > Hey Piper, > > Here's an example for a more advanced Flink application: > https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html > The Flink developers internally maintain a number of testing

Re: Does Flink 1.9 support create or replace views syntax in raw sql?

2020-02-25 Thread kant kodali
Hi Jingsong, Can I store it in Local Filesystem/HDFS? Thanks! On Mon, Jan 20, 2020 at 6:59 PM Jingsong Li wrote: > Hi Kant, > > If you want your view persisted, you must to dock a catalog like hive > catalog, it stores views in the metastore with mysql. > - In 1.10, you can store views in cata

Re: MaxMetaspace default may be to low?

2020-02-25 Thread Xintong Song
I'm sorry that you had bad experience with the migration and configurations. I believe the changing of limiting metaspace size is already documented in various places, but maybe it's not obvious enough that lead to your confusion. Let's keep the discussion on how to improve that in the JIRA ticket

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread Jark Wu
Yes, I'm also in favor of loosen the datetime format constraint. I guess most of the users don't know there is a JSON standard which follows RFC 3339. Best, Jark On Wed, 26 Feb 2020 at 10:06, NiYanchun wrote: > Yes, these Types definition are general. As a user/developer, I would > support “loo

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread NiYanchun
Yes, these Types definition are general. As a user/developer, I would support “loosen it for usability”. If not, may add some explanation about JSON. Original Message Sender: Jark Wu Recipient: Outlook; Dawid Wysakowicz Cc: godfrey he; Leonard Xu; user Date: Wednesday, Feb 26, 2020 09:55 S

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread Jark Wu
Hi Outlook, The explanation in DataTypes is correct, it is compliant to SQL standard. The problem is that JsonRowDeserializationSchema only support RFC-3339. On the other hand, CsvRowDeserializationSchema supports to parse "2019-07-09 02:02:00.040". So the question is shall we insist on the RFC-

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread Outlook
Thanks Godfrey and Leonard, I tried your answers, result is OK. BTW, I think if only accept such format for a long time, the TIME and TIMESTAMP methods' doc in `org.apache.flink.table.api.DataTypes` may be better to update, because the document now is not what the method really support. For e

Re: Getting javax.management.InstanceAlreadyExistsException when upgraded to 1.10

2020-02-25 Thread John Smith
Ok as soon as I can tomorrow. Thanks On Tue, 25 Feb 2020 at 11:51, Khachatryan Roman wrote: > Hi John, > > Seems like this is another instance of > https://issues.apache.org/jira/browse/FLINK-8093 > Could you please provide the full stacktrace? > > Regards, > Roman > > > On Mon, Feb 24, 2020 at

Unsubscribe

2020-02-25 Thread Atle Prange

Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

2020-02-25 Thread Khachatryan Roman
Hi, I think conceptually the pipeline could look something like this: env .addSource(...) .keyBy("device_id") .window(SlidingEventTimeWindows.of(Time.minutes(15), Time.seconds(10))) .trigger(new Trigger { def onElement(el, timestamp, window, ctx) = { if (window.start == TimeWin

Re: Map Of DataStream getting NullPointer Exception

2020-02-25 Thread Khachatryan Roman
As I understand from code, streamMap is a Java map, not Scala. So you can get NPE while unreferencing the value you got from it. Also, the approach looks a bit strange. Can you describe what are you trying to achieve? Regards, Roman On Mon, Feb 24, 2020 at 5:47 PM aj wrote: > > I am trying be

Batch Flink Job S3 write performance vs Spark

2020-02-25 Thread sri hari kali charan Tummala
Hi All, have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? -- Thanks & Regards Sri Tummala

Re: Getting javax.management.InstanceAlreadyExistsException when upgraded to 1.10

2020-02-25 Thread Khachatryan Roman
Hi John, Seems like this is another instance of https://issues.apache.org/jira/browse/FLINK-8093 Could you please provide the full stacktrace? Regards, Roman On Mon, Feb 24, 2020 at 10:48 PM John Smith wrote: > Hi. Just upgraded to 1.10.0 And getting the bellow error when I deploy my > tasks.

Re: state schema evolution for case classes

2020-02-25 Thread Khachatryan Roman
Hi ApoorvK, I understand that you have a savepoint created by Flink 1.6.2 and you want to use it with Flink 1.8.2. The classes themselves weren't modified. Is that correct? Which serializer did you use? Regards, Roman On Tue, Feb 25, 2020 at 8:38 AM ApoorvK wrote: > Hi Team, > > Earlier we ha

Re: How to determine average utilization before backpressure kicks in?

2020-02-25 Thread Khachatryan Roman
Hi Morgan, Thanks for your reply. I think the only possible way to determine this limit is load testing. In the end, this is all load testing is about. I can only suggest testing parts of the system separately to know their individual limits (e.g. IO, CPU). Ideally, this should be done on a regul

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread Leonard Xu
Hi,Outlook Godfrey is right, you should follow the json format[1] when you parse your json message. You can use following code to produce a json data-time String. ``` Long time = System.currentTimeMillis(); DateFormat dateFormat = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'"); Date date =

Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

2020-02-25 Thread Avinash Tripathy
Hi Theo, We also have the same scenario. If it would be great if you could provide some examples or more details about flink process function. Thanks, Avinash On Tue, Feb 25, 2020 at 12:29 PM theo.diefent...@scoop-software.de < theo.diefent...@scoop-software.de> wrote: > Hi, > > At last flink f

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread godfrey he
hi, I find that JsonRowDeserializationSchema only supports date-time with timezone according to RFC 3339. So you need add timezone to time data (like 14:02:00Z) and timestamp data(2019-07-09T02:02:00.040Z). Hope it can help you. Bests, godfrey Outlook 于2020年2月25日周二 下午5:49写道: > By the way, my fl

Re: How to determine average utilization before backpressure kicks in?

2020-02-25 Thread Morgan Geldenhuys
Hi Roman, Thank you for the reply. Yes, I am aware that backpressure can be the result of many factors and yes this is an oversimplification of something very complex, please bare with me. Lets assume that this has been taken into account and has lowered the threshold for when this status per

Re: MaxMetaspace default may be to low?

2020-02-25 Thread John Smith
Ok maybe it can be documented? So just trying to understand, how do most people run their jobs? I mean like they run less tasks, but tasks that have allot direct or mapped memory? Like little JVM_HEAP but huge state outside the JVM? I also recorded this issue: https://issues.apache.org/jira/brows

Re: yarn session: one JVM per task

2020-02-25 Thread David Morin
Perfect. No problem. My Bad. Not really clear. Thanks ! Le mar. 25 févr. 2020 à 13:45, Xintong Song a écrit : > Ah, I misunderstood and thought that you want to keep all your Sink > instances on the same TM. > > If what you want is to have one instance per TM, then as Gary mentioned > specifying

Re: yarn session: one JVM per task

2020-02-25 Thread Xintong Song
Ah, I misunderstood and thought that you want to keep all your Sink instances on the same TM. If what you want is to have one instance per TM, then as Gary mentioned specifying "-s 1" at starting the session would be enough, and it should work with all existing versions above (including) 1.8. Tha

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-25 Thread Xintong Song
> > Do you believe the code of the operators of the restarted Region can be > changed between restarts? I'm not an expert on the restart strategies, but AFAIK the answer is probably not. Sorry I overlooked that you need to modify the job. Thank you~ Xintong Song On Tue, Feb 25, 2020 at 6:00

Re: How to determine average utilization before backpressure kicks in?

2020-02-25 Thread Khachatryan Roman
Hi Morgan, Regarding backpressure, it can be caused by a number of factors, e.g. writing to an external system or slow input partitions. However, if you know that a particular resource is a bottleneck then it makes sense to monitor its saturation. It can be done by using Flink metrics. Please see

Re: yarn session: one JVM per task

2020-02-25 Thread David Morin
Hi Gary, Sorry I was probably not very clear. Yes that's exactly what I want to hear :) I use the -s 1 parameter and what I expect to have is one task of my Sink (one instance in fact) per TM (i.e. per JVM) That's the current behaviour during my tests but I want to be sure. Thanks a lot David Le

How to determine average utilization before backpressure kicks in?

2020-02-25 Thread Morgan Geldenhuys
Hello community, I am fairly new to Flink and have a question concerning utilization. I was hoping someone could help. Knowing that backpressure is essentially the point at which utilization has reached 100% for any particular streaming pipeline and means that the application cannot "keep up

Re: yarn session: one JVM per task

2020-02-25 Thread Gary Yao
Hi David, Before with the both n and -s it was not the case. > What do you mean by before? At least in 1.8 "-s" could be used to specify the number of slots per TM. how can I be sure that my Sink that uses this lib is in one JVM ? > Is it enough that no other parallel instance of your sink run

Re: Java implementations of Streaming applications for Flink

2020-02-25 Thread Robert Metzger
Hey Piper, Here's an example for a more advanced Flink application: https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html The Flink developers internally maintain a number of testing jobs. They are rather artificial, but you still might find some useful things here and there: https://

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-25 Thread Benoît Paris
Hi Xintong Thank you for your answer. This seems promising, I'll look into it. Do you believe the code of the operators of the restarted Region can be changed between restarts? Best Benoît On Tue, Feb 25, 2020 at 2:30 AM Xintong Song wrote: > Hi Ben, > > You can not share slots across jobs.

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread Outlook
By the way, my flink version is 1.10.0. Original Message Sender: Outlook Recipient: user Date: Tuesday, Feb 25, 2020 17:43 Subject: TIME/TIMESTAMP parse in Flink TABLE/SQL API Hi all, I read json data from kafka, and print to console. When I do this, some error occurs when time/timestamp d

TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-25 Thread Outlook
Hi all, I read json data from kafka, and print to console. When I do this, some error occurs when time/timestamp deserialization. json data in Kafka: ``` { "server_date": "2019-07-09", "server_time": "14:02:00", "reqsndtime_c": "2019-07-09 02:02:00.040" } ``` flink code: ``` bsTableEnv.c

Re: yarn session: one JVM per task

2020-02-25 Thread David Morin
Hi Xintong, At the moment I'm using the 1.9.2 with this command: yarn-session.sh -d *-s 1* -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline" So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each on