Re: Configure vvp 2.3 with file blob storage

2020-11-03 Thread Fabian Hueske
Hi Laurent, Thanks for trying out Ververica platform! However, please note that this is the mailing list of the Apache Flink project. Please post further questions using the "Community Edition Feedback" button on this page: https://ververica.zendesk.com/hc/en-us We are working on setting up a

Re: coordination of sinks

2020-08-17 Thread Fabian Hueske
Hi Marco, You cannot really synchronize data that is being emitted via different streams (without bringing them together in an operator). I see two options: 1) emit the event to create the partition and the data to be written into the partition to the same stream. Flink guarantees that records

Re: [SQL DDL] How to extract timestamps from Kafka message's metadata

2020-08-11 Thread Fabian Hueske
Hi Dongwon, Maybe you can add your use case to the FLIP-107 discussion thread [1] and thereby support the proposal (after checking that it would solve your problem). It's always helpful to learn about the requirements of users when designing new features. It also helps to prioritize which

Re: Simple MDC logs don't show up

2020-07-21 Thread Fabian Hueske
Hi, When running your code in the IDE, everything runs in the same local JVM. When you run the job on Kubernetes, the situation is very different. Your code runs in multiple JVM processes distributed in a cluster. Flink provides a metrics collection system that you should use to collect metrics

Re: Flink rest api cancel job

2020-07-21 Thread Fabian Hueske
Hi White, Can you describe your problem in more detail? * What is your Flink version? * How do you deploy the job (application / session cluster), (Kubernetes, Docker, YARN, ...) * What kind of job are you running (DataStream, Table/SQL, DataSet)? Best, Fabian Am Mo., 20. Juli 2020 um 08:42

Re: Pravega connector cannot recover from the checkpoint due to "Failure to finalize checkpoint"

2020-07-21 Thread Fabian Hueske
Hi Brian, AFAIK, Arvid and Piotr (both in CC) have been working on the threading model of the checkpoint coordinator. Maybe they can help with this question. Best, Fabian Am Mo., 20. Juli 2020 um 03:36 Uhr schrieb : > Anyone can help us on this issue? > > > > Best Regards, > > Brian > > > >

Re: Custom metrics output

2020-07-21 Thread Fabian Hueske
Hi Joris, I don't think that the approach of "add methods in operator class code that can be called from the main Flink program" will work. The most efficient approach would be implementing a ProcessFunction that counts in 1-min time buckets (using event-time semantics) and updates the metrics.

Re: How to ensure that job is restored from savepoint when using Flink SQL

2020-07-08 Thread Fabian Hueske
2FdefaultAvatar.png=%5B%22shadowell%40126.com%22%5D> > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 > On 7/7/2020 17:23,Fabian Hueske > wrote: > > Hi Jie Feng, > > As you said, Flink translates SQL queries into streaming programs with > auto-ge

Re: How to ensure that job is restored from savepoint when using Flink SQL

2020-07-07 Thread Fabian Hueske
Hi Jie Feng, As you said, Flink translates SQL queries into streaming programs with auto-generated operator IDs. In order to start a SQL query from a savepoint, the operator IDs in the savepoint must match the IDs in the newly translated program. Right now this can only be guaranteed if you

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-17 Thread Fabian Hueske
Congrats Yu! Cheers, Fabian Am Mi., 17. Juni 2020 um 10:20 Uhr schrieb Till Rohrmann < trohrm...@apache.org>: > Congratulations Yu! > > Cheers, > Till > > On Wed, Jun 17, 2020 at 7:53 AM Jingsong Li > wrote: > > > Congratulations Yu, well deserved! > > > > Best, > > Jingsong > > > > On Wed,

Re: Flink 1.8.3 Kubernetes POD OOM

2020-05-22 Thread Fabian Hueske
Hi Josson, I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs. He might be able to help. Best, Fabian Am Do., 21. Mai 2020 um 18:43 Uhr

Re: Adaptive Watermarks Generator

2020-05-22 Thread Fabian Hueske
Hi, The code of the implementation is linked in the paper: https://github.com/DataSystemsGroupUT/Adaptive-Watermarks Since this is a prototype for a research paper, I'm doubtful that the project is maintained. I also didn't find an open-source license attached to the code. Hence adding the

Re: Broadcast stream causing GC overhead limit exceeded

2020-05-07 Thread Fabian Hueske
m broadcast stream. > > Thanks! > Eleanore > > On Tue, May 5, 2020 at 12:31 AM Fabian Hueske wrote: > >> Hi Eleanore, >> >> The "GC overhead limit exceeded" error shows that the JVM spends way too >> much time garbage collecting and only recover

Re: multiple joins in one job

2020-05-06 Thread Fabian Hueske
gt;> >>>> Benchao Li 于 2020年5月5日周二 17:26写道: >>>> >>>>> Hi lec, >>>>> >>>>> You don't need to specify time attribute again like `TUMBLE_ROWTIME`, >>>>> you just select the time attribute field >>>>>

Re: table.show() in Flink

2020-05-05 Thread Fabian Hueske
There's also the Table API approach if you want to avoid typing a "full" SQL query: Table t = tEnv.from("myTable"); Cheers, Fabian Am Di., 5. Mai 2020 um 16:34 Uhr schrieb Őrhidi Mátyás < matyas.orh...@gmail.com>: > Thanks guys for the prompt answers! > > On Tue, May 5, 2020 at 2:49 PM Kurt

Re: multiple joins in one job

2020-05-05 Thread Fabian Hueske
ent to make it . Can it be possible? > > Fabian Hueske 于2020年5月4日周一 下午4:04写道: > >> Hi, >> >> If the interval join emits the time attributes of both its inputs, you >> can use either of them as a time attribute in a following operator because >> the join ensures th

Re: Broadcast stream causing GC overhead limit exceeded

2020-05-05 Thread Fabian Hueske
Hi Eleanore, The "GC overhead limit exceeded" error shows that the JVM spends way too much time garbage collecting and only recovers little memory with every run. Since, the program doesn't make any progress in such a situation it is terminated with the GC Overhead Error. This typically happens

Re: multiple joins in one job

2020-05-04 Thread Fabian Hueske
Hi, If the interval join emits the time attributes of both its inputs, you can use either of them as a time attribute in a following operator because the join ensures that the watermark will be aligned with both of them. Best, Fabian Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi : > Thanks

Re: FlinkKafakaProducer with Confluent SchemaRegistry and KafkaSerializationSchema

2020-04-21 Thread Fabian Hueske
> } > > Then used a new object of GenericSerializer in the FlinkKafkaProducer > > FlinkKafkaProducer producer = > new FlinkKafkaProducer<>(topic, new GenericSerializer(topic, schema, > schemaRegistryUrl), kafkaConfig, Semantic.AT_LEAST_ONCE); > > Thanks , Anil. > > &

Re: Problem getting watermark right with event time

2020-04-20 Thread Fabian Hueske
Hi Sudan, I noticed a few issues with your code: 1) Please check the computation of timestamps. Your code public long extractAscendingTimestamp(Eventi.Event element) { return element.getEventTime().getSeconds() * 1000; } only seems to look at the seconds of a timestamp. Typically, you

Re: FlinkKafakaProducer with Confluent SchemaRegistry and KafkaSerializationSchema

2020-04-20 Thread Fabian Hueske
Hi Anil, Here's a pointer to Flink's end-2-end test that's checking the integration with schema registry [1]. It was recently updated so I hope it works the same way in Flink 1.9. Best, Fabian [1]

Re: how to hold a stream until another stream is drained?

2020-04-06 Thread Fabian Hueske
Hi, With Flink streaming operators However, these parts are currently being reworked to enable a better integration of batch and streaming use cases (or hybrid use cases such as yours). A while back, we wrote a blog post about these plans [1]: > *"Unified Stream Operators:* Blink extends the

Re: Storing Operator state in RocksDb during runtime - plans

2020-04-06 Thread Fabian Hueske
Hi Kristoff, I'm not aware of any concrete plans for such a feature. Best, Fabian Am So., 5. Apr. 2020 um 22:33 Uhr schrieb KristoffSC < krzysiek.chmielew...@gmail.com>: > Hi, > according to [1] operator state and broadcast state (which is a "special" > type of operator state) are not stored

Re: Flink job getting killed

2020-04-06 Thread Fabian Hueske
Hi Giriraj, This looks like the deserialization of a String failed. Can you isolate the problem to a pair of sending and receiving tasks? Best, Fabian Am So., 5. Apr. 2020 um 20:18 Uhr schrieb Giriraj Chauhan < graj.chau...@gmail.com>: > Hi, > > We are submitting a flink(1.9.1) job for data

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-21 Thread Fabian Hueske
Congrats Jingsong! Cheers, Fabian Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong : > Congratulations Jingsong!! > > Cheers, > Rong > > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li wrote: > > > Congrats, Jingsong! > > > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann > > wrote: > > > >>

Re: [ANNOUNCE] Flink Forward San Francisco 2020 Program is Live

2020-02-14 Thread Fabian Hueske
Fr., 14. Feb. 2020 um 17:48 Uhr schrieb Fabian Hueske : > Hi everyone, > > We announced the program of Flink Forward San Francisco 2020. > The conference takes place at the Hyatt Regency in San Francisco from > March 23rd to 25th. > > On the first day we offer four

[ANNOUNCE] Flink Forward San Francisco 2020 Program is Live

2020-02-14 Thread Fabian Hueske
Hi everyone, We announced the program of Flink Forward San Francisco 2020. The conference takes place at the Hyatt Regency in San Francisco from March 23rd to 25th. On the first day we offer four training sessions [1]: * Apache Flink Developer Training * Apache Flink Runtime & Operations

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Fabian Hueske
Congrats team and a big thank you to the release managers! Am Mi., 12. Feb. 2020 um 16:33 Uhr schrieb Timo Walther : > Congratualations everyone! Great stuff :-) > > Regards, > Timo > > > On 12.02.20 16:05, Leonard Xu wrote: > > Great news! > > Thanks everyone involved ! > > Thanks Gary and Yu

Re: Flink solution for having shared variable between task managers

2020-02-03 Thread Fabian Hueske
Hi, I think you are looking for BroadcastState [1]. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html Am Fr., 17. Jan. 2020 um 14:50 Uhr schrieb Soheil Pourbafrani < soheil.i...@gmail.com>: > Hi, > > According to the processing

[ANNOUNCE] Community Discounts for Flink Forward SF 2020 Registrations

2020-01-30 Thread Fabian Hueske
Hi everyone, The registration for Flink Forward SF 2020 is open now! Flink Forward San Francisco 2020 will take place from March 23rd to 25th. The conference will start with one day of training and continue with two days of keynotes and talks. We would like to invite you to join the Apache Flink

Re: PostgreSQL JDBC connection drops after inserting some records

2020-01-28 Thread Fabian Hueske
Hi, The exception is thrown by Postgres. I'd start investigating there what the problem is. Maybe you need to tweak your Postgres configuration, but it might also be that the Flink connector needs to be differently configured. If the necessary config option is missing, it would be good to add.

Re: How to declare the Row object schema

2020-01-17 Thread Fabian Hueske
Hi, Which version are you using? I can't find the error message in the current code base. When writing data to a JDBC database, all Flink types must be correctly matched to a JDBC type. The problem is probably that Flink cannot match the 8th field of your Row to a JDBC type. What's the type of

Re: Filter with large key set

2020-01-17 Thread Fabian Hueske
Hi Eleanore, A dynamic filter like the one you need, is essentially a join operation. There is two ways to do this: * partitioning the key set and the message on the attribute. This would be done with a KeyedCoProcessFunction. * broadcasting the key set and just locally forwarding the messages.

Re: Filter with large key set

2020-01-17 Thread Fabian Hueske
Hi Eleanore, A dynamic filter like the one you need, is essentially a join operation. There is two ways to do this: * partitioning the key set and the message on the attribute. This would be done with a KeyedCoProcessFunction. * broadcasting the key set and just locally forwarding the messages.

Re: Why would indefinitely growing state an issue for Flink while doing stream to stream joins?

2020-01-17 Thread Fabian Hueske
Hi, Large state is mainly an issue for Flink's fault tolerance mechanism which is based on periodic checkpoints, which means that the state is copied to a remote storage system in regular intervals. In case of a failure, the state copy needs to be loaded which takes more time with growing state

[ANNOUNCE] Flink Forward San Francisco 2020 Call for Presentation extended!

2020-01-13 Thread Fabian Hueske
Hi everyone, We know some of you only came back from holidays last week. To give you more time to submit a talk, we decided to extend the Call for Presentations for Flink Forward San Francisco 2020 until Sunday January 19th. The conference takes place on March 23-25 with two days of talks and

[ANNOUNCE] Flink Forward SF Call for Presentation closing soon!

2020-01-06 Thread Fabian Hueske
Hi all, First of all, Happy New Year to everyone! Many of you probably didn't spent the holidays thinking a lot about Flink. Now, however, is the right time to focus again and decide which talk(s) to submit for Flink Forward San Francisco because the Call for Presentations is closing this

Re: [ANNOUNCE] Zhu Zhu becomes a Flink committer

2019-12-13 Thread Fabian Hueske
Congrats Zhu Zhu and welcome on board! Best, Fabian Am Fr., 13. Dez. 2019 um 17:51 Uhr schrieb Till Rohrmann < trohrm...@apache.org>: > Hi everyone, > > I'm very happy to announce that Zhu Zhu accepted the offer of the Flink PMC > to become a committer of the Flink project. > > Zhu Zhu has been

Re: Joining multiple temporal tables

2019-12-06 Thread Fabian Hueske
ted, > https://issues.apache.org/jira/browse/FLINK-15112 > > Many thanks, > Chris > > > -- Original Message -- > From: "Fabian Hueske" > To: "Chris Miller" > Cc: "user@flink.apache.org" > Sent: 06/12/2019 14:52:16 > Subject: Re

Re: Joining multiple temporal tables

2019-12-06 Thread Fabian Hueske
Hi Chris, Your query looks OK to me. Moreover, you should get a SQLParseException (or something similar) if it wouldn't be valid SQL. Hence, I assume you are running in a bug in one of the optimizer rules. I tried to reproduce the problem on the SQL training environment and couldn't write a

Re: Row arity of from does not match serializers.

2019-12-06 Thread Fabian Hueske
Hi, The inline lambda MapFunction produces a Row with 12 String fields (12 calls to String.join()). You use RowTypeInfo rowTypeDNS to declare the return type of the lambda MapFunction. However, rowTypeDNS is defined with much more String fields. The exception tells you that the number of fields

Flink Forward North America 2020 - Call for Presentations open until January 12th, 2020

2019-11-20 Thread Fabian Hueske
Hi all, Flink Forward North America returns to San Francisco on March 23-25, 2020. For the first time in North America, the conference will feature two days of talks and one day of training. We are happy to announce that the Call for Presentations is open! If you'd like to give a talk and share

Re: Issue with writeAsText() to S3 bucket

2019-11-06 Thread Fabian Hueske
ith various filters applied to it. I usually see > around 6-7 of my datastreams successfully list the JSON file in my S3 > bucket upon cancelling my Flink job. > > > > Even in my situation, would this still be an issue with S3’s file listing > command? > > > > Thanks,

Re: Are Dynamic tables backed by rocksdb?

2019-10-31 Thread Fabian Hueske
Hi, Dynamic tables might not be persisted at all but only when it is necessary for the computation of a query. For example a simple "SELECT * FROM t WHERE a = 1" query on an append only table t does not require to persist t. However, there are a bunch of operations that require to store some

Re: Guarantee of event-time order in FlinkKafkaConsumer

2019-10-25 Thread Fabian Hueske
Hi Wojciech, I posted an answer on StackOverflow. Best, Fabian Am Do., 24. Okt. 2019 um 13:03 Uhr schrieb Wojciech Indyk < wojciechin...@gmail.com>: > Hi! > I use Flink 1.8.0 with Kafka 2.2.1. I need to guarantee of correct order > of events by event timestamp. I generate periodic watermarks

Re: Flink 1.5+ performance in a Java standalone environment

2019-10-25 Thread Fabian Hueske
Hi Jakub, I had a look at the changes of Flink 1.5 [1] and didn't find anything obvious. Something that might cause a different behavior is the new deployment and process model (FLIP-6). In Flink 1.5, there is a switch to disable it and use the previous deployment mechanism. You could try to

Re: Using STSAssumeRoleSessionCredentialsProvider for cross account access

2019-10-25 Thread Fabian Hueske
Hi Vinay, Maybe Gordon (in CC) has an idea about this issue. Best, Fabian Am Do., 24. Okt. 2019 um 14:50 Uhr schrieb Vinay Patil < vinay18.pa...@gmail.com>: > Hi, > > Can someone pls help here , facing issues in Prod . I see the following > ticket in unresolved state. > >

Re: Can a Flink query outputs nested json?

2019-10-25 Thread Fabian Hueske
Hi, I did not understand what you are trying to achieve. Which field of the input table do you want to write to the output table? Flink SQL> insert into nestedSink select nested from nestedJsonStream; [INFO] Submitting SQL update statement to the cluster... [ERROR] Could not execute SQL

Re: JDBCInputFormat does not support json type

2019-10-25 Thread Fabian Hueske
Hi Fanbin, One approach would be to ingest the field as a VARCHAR / String and implement a Scalar UDF to convert it into a nested tuple. The UDF could use the code of the flink-json module. AFAIK, there is some work on the way to add built-in JSON functions. Best, Fabian Am Do., 24. Okt. 2019

Re: Flink 1.9 measuring time taken by each operator in DataStream API

2019-10-25 Thread Fabian Hueske
Hi Komal, Measuring latency is always a challenge. The problem here is that your functions are chained, meaning that the result of a function is directly passed on to the next function and only when the last function emits the result, the first function is called with a new record. This makes

Re: Issue with writeAsText() to S3 bucket

2019-10-25 Thread Fabian Hueske
Hi Michael, One reason might be that S3's file listing command is only eventually consistent. It might take some time until the file appears and is listed. Best, Fabian Am Mi., 23. Okt. 2019 um 22:41 Uhr schrieb Nguyen, Michael < michael.nguye...@t-mobile.com>: > Hello all, > > > > I am

Re: [Problem] Unable to do join on TumblingEventTimeWindows using SQL

2019-10-25 Thread Fabian Hueske
Hi, the exception says: "Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before.". The problem is that your query first joins the two tables without a temporal condition and then wants to do a

Re: Increasing number of task slots in the task manager

2019-10-02 Thread Fabian Hueske
Hi Vishwas, First of all, 8 GB for 60 cores is not a lot. You might not be able to utilize all cores when running Flink. However, the memory usage depends on several things. Assuming your are using Flink for stream processing, the type of the state backend is important. If you use the

Re: Fencing token exceptions from Job Manager High Availability mode

2019-10-02 Thread Fabian Hueske
Hi Bruce, I haven't seen such an exception yet, but maybe Till (in CC) can help. Best, Fabian Am Di., 1. Okt. 2019 um 05:51 Uhr schrieb Hanson, Bruce < bruce.han...@here.com>: > Hi all, > > > > We are running some of our Flink jobs with Job Manager High Availability. > Occasionally we get a

Re: Increasing trend for state size of keyed stream using ProcessWindowFunction with ProcessingTimeSessionWindows

2019-10-02 Thread Fabian Hueske
Hi Oliwer, I think you are right. There seems to be something going wrong. Just to clarify, you are sure that the growing state size is caused by the window operator? >From your description I assume that the state size does not depend (solely) on the number of distinct keys. Otherwise, the state

Re: Broadcast state

2019-10-02 Thread Fabian Hueske
Hi, State is always associated with a single task in Flink. The state of a task cannot be accessed by other tasks of the same operator or tasks of other operators. This is true for every type of state, including broadcast state. Best, Fabian Am Di., 1. Okt. 2019 um 08:22 Uhr schrieb Navneeth

Re: Flink- Heap Space running out

2019-09-26 Thread Fabian Hueske
Hi, I don' think that the memory configuration is the issue. The problem is the join query. The join does not have any temporal boundaries. Therefore, both tables are completely stored in memory and never released. You can configure a memory eviction strategy via idle state retention [1] but you

Re: Flink job manager doesn't remove stale checkmarks

2019-09-25 Thread Fabian Hueske
Hi, You enabled incremental checkpoints. This means that parts of older checkpoints that did not change since the last checkpoint are not removed because they are still referenced by the incremental checkpoints. Flink will automatically remove them once they are not needed anymore. Are you sure

Re: Joins Usage Clarification

2019-09-25 Thread Fabian Hueske
Hi Nishant, To answer your questions: 1) yes, the SQL time-windowed join and the DataStream API Interval Join are the same (with different implementations though) 2) DataStream Session-window joins are not directly supported in SQL. You can play some tricks to make it work, but it wouldn't be

Re: Question about reading ORC file in Flink

2019-09-25 Thread Fabian Hueske
the column, so it can read the fields. > > Thanks for your Help! > > Qi Shu > > > 在 2019年9月24日,下午4:36,Fabian Hueske 写道: > > Hi QiShu, > > It might be that Flink's OrcInputFormat has a bug. > Can you open a Jira issue to report the problem? > In order to

Re: How do I create a temporal function using Flink Clinet SQL?

2019-09-24 Thread Fabian Hueske
Hi, It's not possible to create a temporal table function from SQL, but you can define it in the config.yaml of the SQL client as described in the documentation [1]. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#temporal-tables Am Di.,

Re: Approach to match join streams to create unique streams.

2019-09-24 Thread Fabian Hueske
Hi, AFAIK, Flink SQL Temporal table function joins are only supported as inner equality joins. An extension to left outer joins would be great, but is not on the immediate roadmap AFAIK. If you need the inverse, I'd recommend to implement the logic in a DataStream program with a

Re: Question about reading ORC file in Flink

2019-09-24 Thread Fabian Hueske
Hi QiShu, It might be that Flink's OrcInputFormat has a bug. Can you open a Jira issue to report the problem? In order to be able to fix this, we need as much information as possible. It would be great if you could create a minimal example of an ORC file and a program that reproduces the issue.

Re: How to use thin JAR instead of fat JAR when submitting Flink job?

2019-09-24 Thread Fabian Hueske
Hi, To expand on Dian's answer. You should not add Flink's core libraries (APIs, core, runtime, etc.) to your fat JAR. However, connector dependencies (like Kafka, Cassandra, etc.) should be added. If all your jobs require the same dependencies, you can also add JAR files to the ./lib folder of

Re: Time Window Flink SQL join

2019-09-20 Thread Fabian Hueske
rty(ProducerConfig.RETRIES_CONFIG, "3"); > > ObjectMapper mapper = new ObjectMapper(); > DataStream sinkStreamMaliciousData = outStreamMalicious > .map(new MapFunction,String>() { > private static final long serialVersionUID = -6347120202L; > @Override > public St

Re: Best way to compute the difference between 2 datasets

2019-09-20 Thread Fabian Hueske
Btw. there is a set difference or minus operator in the Table API [1] that might be helpful. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/tableApi.html#set-operations Am Fr., 20. Sept. 2019 um 15:30 Uhr schrieb Fabian Hueske : > Hi Juan, > > Both,

Re: changing flink/kafka configs for stateful flink streaming applications

2019-09-20 Thread Fabian Hueske
Hi, It depends. There are many things that can be changed. A savepoint in Flink contains only the state of the application and not the configuration of the system. So an application can be migrated to another cluster that runs with a different configuration. There are some exceptions like the

Re: Window metadata removal

2019-09-20 Thread Fabian Hueske
tes. > If we have 200,000,000 per day and the allowed lateness is > set to 7 days: > 200,000,000 * 64 * 7 = ~83GB > > *For the scenario above the window metadata is useless*. > Is there a possibility to *keep using window API*, *set allowed lateness* > and *not keep the windo

Re: Add Bucket File System Table Sink

2019-09-20 Thread Fabian Hueske
Hi Jun, Thank you very much for your contribution. I think a Bucketing File System Table Sink would be a great addition. Our code contribution guidelines [1] recommend to discuss the design with the community before opening a PR. First of all, this ensures that the design is aligned with

Re: Best way to compute the difference between 2 datasets

2019-09-20 Thread Fabian Hueske
Hi Juan, Both, the local execution environment and the remote execution environment run the same code to execute the program. The implementation of the sortPartition operator was designed to scale to data sizes that exceed the memory. Internally, it serializes all records into byte arrays and

Re: Batch mode with Flink 1.8 unstable?

2019-09-19 Thread Fabian Hueske
Hi Ken, Changing the parallelism can affect the generation of input splits. I had a look at BinaryInputFormat, and it adds a bunch of empty input splits if the number of generated splits is less than the minimum number of splits (which is equal to the parallelism). See -->

Re: Time Window Flink SQL join

2019-09-18 Thread Fabian Hueske
But with that 60 gb memory getting run out > > So i used below query. > Can u please guide me in this regard > > On Wed, 18 Sep 2019 at 5:53 PM, Fabian Hueske wrote: > >> Hi, >> >> The query that you wrote is not a time-windowed join. >> >> INSERT IN

Re: Time Window Flink SQL join

2019-09-18 Thread Fabian Hueske
Hi, The query that you wrote is not a time-windowed join. INSERT INTO sourceKafkaMalicious SELECT sourceKafka.* FROM sourceKafka INNER JOIN badips ON sourceKafka.`source.ip`=badips.ip WHERE sourceKafka.timestamp_received BETWEEN CURRENT_TIMESTAMP - INTERVAL '15' MINUTE AND CURRENT_TIMESTAMP;

Re: Is Idle state retention time in SQL client possible?

2019-09-17 Thread Fabian Hueske
Hi, This can be set via the environment file. Please have a look at the documentation [1] (see "execution: min-idle-state-retention: " and "execution: max-idle-retention: " keys). Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#environment-files

Re: Compound Keys Using Temporal Tables

2019-09-16 Thread Fabian Hueske
Hi, No, this is not possible at the moment. You can only pass a single expression as primary key. A work around might be to put the two fields in a nested field (haven't tried if this works) or combine them in a single attribute, for example by casting them to VARCHAR and concating them. Best,

Re: Uncertain result when using group by in stream sql

2019-09-13 Thread Fabian Hueske
Hi, A GROUP BY query on a streaming table requires that the result is continuously updated. Updates are propagated as a retraction stream (see tEnv.toRetractStream(table, Row.class).print(); in your code). A retraction stream encodes the type of the update as a boolean flag, the "true" and

Re: How to handle avro BYTES type in flink

2019-09-13 Thread Fabian Hueske
Thanks for reporting back Catlyn! Am Do., 12. Sept. 2019 um 19:40 Uhr schrieb Catlyn Kong : > Turns out there was some other deserialization problem unrelated to this. > > On Mon, Sep 9, 2019 at 11:15 AM Catlyn Kong wrote: > >> Hi fellow streamers, >> >> I'm trying to support avro BYTES type in

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Fabian Hueske
Congrats Zili Chen :-) Cheers, Fabian Am Mi., 11. Sept. 2019 um 12:48 Uhr schrieb Biao Liu : > Congrats Zili! > > Thanks, > Biao /'bɪ.aʊ/ > > > > On Wed, 11 Sep 2019 at 18:43, Oytun Tez wrote: > >> Congratulations! >> >> --- >> Oytun Tez >> >> *M O T A W O R D* >> The World's Fastest Human

Re: Checkpointing is not performing well

2019-09-11 Thread Fabian Hueske
Hi, There is no upper limit for state size in Flink. There are applications with 10+ TB state. However, it is natural that checkpointing time increases with state size as more data needs to be serialized (in case of FSStateBackend) and written to stable storage. (The same is btw true for recovery

Re:

2019-09-11 Thread Fabian Hueske
Hi, This is clearly a Scala version issue. You need to make sure that all Flink dependencies have the same version and are compiled for Scala 2.11. The "_2.11" postfix in the dependency name indicates that it is a Scala 2.11 dependency ("_2.12 indicates Scala 2.12 compatibility). Best, Fabian

Re: Filter events based on future events

2019-09-11 Thread Fabian Hueske
Hi Theo, I would implement this with a KeyedProcessFunction. These are the important points to consider: 1) partition the output of the Kafka source by Kafka partition (or the attribute that determines the partition). This will ensure that the data stay in order (per partition). 2) The

Re: Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-09-10 Thread Fabian Hueske
Hi, that would be regular SQL cast syntax: SELECT a, b, c, CAST(eventTime AS TIMESTAMP) FROM ... Am Di., 10. Sept. 2019 um 18:07 Uhr schrieb Niels Basjes : > Hi. > > Can you give me an example of the actual syntax of such a cast? > > On Tue, 10 Sep 2019, 16:30 Fabian Hueske,

Re: Join with slow changing dimensions/ streams

2019-09-10 Thread Fabian Hueske
database systems have to deal with. Best, Fabian Am Do., 5. Sept. 2019 um 13:37 Uhr schrieb Hanan Yehudai < hanan.yehu...@radcom.com>: > Thanks Fabian. > > > is there any advantage using broadcast state VS using just CoMap function > on 2 connected streams ? > > > &g

Re: Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-09-10 Thread Fabian Hueske
Hi Niels, I think (not 100% sure) you could also cast the event time attribute to TIMESTAMP before you emit the table. This should remove the event time property (and thereby the TimeIndicatorTypeInfo) and you wouldn't know to fiddle with the output types. Best, Fabian Am Mi., 21. Aug. 2019 um

Re: Implementing CheckpointableInputFormat

2019-09-06 Thread Fabian Hueske
Hi, CheckpointableInputFormat is only relevant if you plan to use the InputFormat in a MonitoringFileSource, i.e., in a streaming application. If you plan to use it in a DataSet (batch) program, InputFormat is fine. Btw. the latest release Flink 1.9.0 has major improvements for the recovery of

Re: TABLE API + DataStream outsourcing schema or Pojo?

2019-09-06 Thread Fabian Hueske
String key = iterator.next(); > row.setField(pos, jsonNode.get(key).asText()); > pos++; > } > return row; > } > }).returns(convert); > > Table tableA = tEnv.fromDataStream(dataStreamRow); > > > Le jeu. 5 sept. 2019 à 13:23, Fabian Hueske a écrit :

[ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Fabian Hueske
Hi everyone, I'm very happy to announce that Kostas Kloudas is joining the Flink PMC. Kostas is contributing to Flink for many years and puts lots of effort in helping our users and growing the Flink community. Please join me in congratulating Kostas! Cheers, Fabian

Re: Exception when trying to change StreamingFileSink S3 bucket

2019-09-05 Thread Fabian Hueske
Hi, Kostas (in CC) might be able to help. Best, Fabian Am Mi., 4. Sept. 2019 um 22:59 Uhr schrieb sidhartha saurav < sidsau...@gmail.com>: > Hi, > > Can someone suggest a workaround so that we do not get this issue while > changing the S3 bucket ? > > On Thu, Aug 22, 2019 at 4:24 PM sidhartha

Re: TABLE API + DataStream outsourcing schema or Pojo?

2019-09-05 Thread Fabian Hueske
Hi Steve, Maybe you could implement a custom TableSource that queries the data from the rest API and converts the JSON directly into a Row data type. This would also avoid going through the DataStream API just for ingesting the data. Best, Fabian Am Mi., 4. Sept. 2019 um 15:57 Uhr schrieb Steve

Re: error in my job

2019-09-05 Thread Fabian Hueske
Hi, Are you getting this error repeatedly or was this a single time? If it's just a single time error, it's probably caused by a task manager process that died for some reason (as suggested by the error message). You should have a look at the TM logs whether you can finds something that would

Re: understanding task manager logs

2019-09-05 Thread Fabian Hueske
Hi Vishwas, This is a log statement from Kafka [1]. Not sure how when AppInfoParser is created (the log message is written by the constructor). For Kafka versions > 1.0, I'd recommend the universal connector [2]. Not sure how well it works if producers and consumers have different versions.

Re: Window metadata removal

2019-09-05 Thread Fabian Hueske
Hi, A window needs to keep the data as long as it expects new data. This is clearly the case before the end time of the window was reached. If my window ends at 12:30, I want to wait (at least) until 12:30 before I remove any data, right? In case you expect some data to be late, you can

Re: Join with slow changing dimensions/ streams

2019-09-05 Thread Fabian Hueske
Hi, Flink does not have good support for mixing bounded and unbounded streams in its DataStream API yet. If the dimension table is static (and small enough), I'd use a RichMapFunction and load the table in the open() method into the heap. In this case, you'd probably need to restart the job (can

[ANNOUNCE] Flink Forward training registration closes on September 30th

2019-09-05 Thread Fabian Hueske
Hi all, The registration for the Flink Forward Europe training sessions closes in four weeks. The training takes place in Berlin at October 7th and is followed by two days of talks by speakers from companies like Airbus, Goldman Sachs, Netflix, Pinterest, and Workday [1]. The following four

Re: tumbling event time window , parallel

2019-09-02 Thread Fabian Hueske
schrieb Hanan Yehudai < hanan.yehu...@radcom.com>: > Im not sure what you mean by use process function and not window process > function , as the window operator takes in a windowprocess function.. > > > > *From:* Fabian Hueske > *Sent:* Monday, August 26, 2019 1:

Re: End of Window Marker

2019-09-02 Thread Fabian Hueske
ut of ‘window order’. >> >> I was also thinking this problem is very similar to that of checkpoint >> barriers. I intended to dig into the details of the exactly once Kafka sink >> for some inspiration. >> >> Padarn >> >> On Tue, 27 Aug 2019 at 11:01 PM, Fa

Re: checkpoint failure in forever loop suddenly even state size less than 1 mb

2019-09-02 Thread Fabian Hueske
Hi Sushant, It's hard to tell what's going on. Maybe the thread pool of the async io operator is too small for the ingested data rate? This could cause the backpressure on the source and eventually also the failing checkpoints. Which Flink version are you using? Best, Fabian Am Do., 29. Aug.

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-27 Thread Fabian Hueske
D* > The World's Fastest Human Translation Platform. > oy...@motaword.com — www.motaword.com > > > On Tue, Aug 27, 2019 at 8:11 AM Fabian Hueske wrote: > >> Hi all, >> >> Flink 1.9 Docker images are available at Docker Hub [1] now. >> Due to some conf

Re: I'm not able to make a stream-stream Time windows JOIN in Flink SQL

2019-08-27 Thread Fabian Hueske
more when testing all of those > combinations. Now the second attempt works but isn't really what I wanted > to query (as the "same day"-predicate is still missing). > > Best regards > Theo > > -- > *Von: *"Fabian Hueske" > *

Re: Are there any news on custom trigger support for SQL/Table API?

2019-08-27 Thread Fabian Hueske
Hi Theo, The work on custom triggers has been put on hold due to some major refactorings (splitting the modules, porting Scala code to Java, new type system, new catalog interfaces, integration of the Blink planner). It's also not on the near-time roadmap AFAIK. To be honest, I'm not sure how

  1   2   3   4   5   6   7   8   9   10   >