Re: Reading from sockets using dataset api

2020-04-23 Thread Kaan Sancak
Thanks for the answer! Also thanks for raising some concerns about my question. Some of the graphs I have been using is larger than 1.5 tb, and I am currently an experiment stage of a project, and I am making modifications to my code and re-runing the experiments again. Currently, on some of the

Re: Handling stale data enrichment

2020-04-23 Thread Konstantin Knauf
Hi Vinay, I assume your subscription updates also have a timestamp and a watermark. Otherwise, there is no way for Flink to tell that the subscription updates are late. If you use a "temporal table "-style join to join the two streams, and you do not receive any subscription updates for 2 hours,

Re: Flink Forward 2020 Recorded Sessions

2020-04-23 Thread Marta Paes Moreira
Hi, Sivaprasanna. The talks will be up on Youtube sometime after the conference ends. Today, the starting schedule is different (9AM CEST / 12:30PM IST / 3PM CST) and more friendly to Europe, India and China. Hope you manage to join some sessions! Marta On Fri, 24 Apr 2020 at 06:58, Sivaprasann

Re: Task Assignment

2020-04-23 Thread Navneeth Krishnan
Hi Marta, Thanks for you response. What I'm looking for is something like data localization. If I have one TM which is processing a set of keys, I want to ensure all keys of the same type goes to the same TM rather than using hashing to find the downstream slot. I could use a common key to do this

Flink Forward 2020 Recorded Sessions

2020-04-23 Thread Sivaprasanna
Hello, I had registered for the Flink Forward 2020 and had attended couple of sessions but due to the odd timings and overlapping sessions on the same slot, I wasn't able to attend some interesting talks. I have received mails with link to rewatch some 2-3 webinars but not all (that had happened y

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-23 Thread Lu Niu
Hi, Robert BTW, I did some field study and I think it's possible to support streaming sink using presto s3 filesystem. I think that would help user to use presto s3 fs in all access to s3. I created this jira ticket https://issues.apache.org/jira/browse/FLINK-17364 . what do you think? Best Lu O

Re: Debug Slowness in Async Checkpointing

2020-04-23 Thread Lu Niu
Hi, Robert Thanks for relying. Yeah. After I added monitoring on the above path, it shows the slowness did come from uploading file to s3. Right now I am still investigating the issue. At the same time, I am trying PrestoS3FileSystem to check whether that can mitigate the problem. Best Lu On Thu

Re: IntelliJ java formatter

2020-04-23 Thread Xintong Song
Hi Flavio, I'm not aware of anyway to automatically format the codes. The only thing I find that might help is to enable your IDE with a checkstyle plugin. https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/ide_setup.html#checkstyle-for-java Thank you~ Xintong Song On Thu, Apr 23

Re: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory)

2020-04-23 Thread Fu, Kai
Hi, thanks for the reply. It was indeed the class loading issue and it’s introduced by latest version of package “aws-kinesisanalytics-runtime”. I resolved the issue by removing the package and customized the runtime myself. -- Best wishes Fu Kai From: Arvid Heise Date: Thursday, April 23, 2

Re: Flink 1.10 Out of memory

2020-04-23 Thread Xintong Song
@Stephan, I don't think so. If JVM hits the direct memory limit, you should see the error message "OutOfMemoryError: Direct buffer memory". Thank you~ Xintong Song On Thu, Apr 23, 2020 at 6:11 PM Stephan Ewen wrote: > @Xintong and @Lasse could it be that the JVM hits the "Direct Memory" > li

Re: JDBC table api questions

2020-04-23 Thread Zhenghua Gao
FLINK-16471 introduce a JDBCCatalog, which implements Catalog interface. Currently we only support PostgresCatalog and listTables(). If you want to get the list of views, you can implement listViews() (currently return an empty list). *Best Regards,* *Zhenghua Gao* On Thu, Apr 23, 2020 at 8:48 P

OUT OF MEMORY : CORRECTION

2020-04-23 Thread Zahid Rahman
There was a post earlier with some one had a problem of out of memory error with flink. The answer is to reduce flink managed memory from default 70% to may be 50%. This error could be caused due to missing memory ; or maintaining a local list by programmer so over using user allocated memory

Re: JDBC Table and parameters provider

2020-04-23 Thread Flavio Pompermaier
I've created 3 ticket related to this discussion, feel free to comment them: 1. https://issues.apache.org/jira/browse/FLINK-17358 - JDBCTableSource support FiltertableTableSource 2. https://issues.apache.org/jira/browse/FLINK-17360 - Support custom partitioners in JDBCReadOptions

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-23 Thread Yun Tang
Hi Averell Please build your own flink docker with S3 plugin as official doc said [1] [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html#using-plugins Best Yun Tang From: Averell Sent: Thursday, April 23, 2020 20:58 To: user@f

Restore from save point but need to read from different Kafka topics

2020-04-23 Thread Casado Tejedor , Rubén
Hi Let me introduce our scenario: 1. We have a Flink job reading from a Kafka topic, using the Flink Kafka. Name of Kafka topic is an input variable in properties file 2. A savepoint is created for that job, so the Kafka offsets for the input topic is stored in that savepoint 3. The j

Re: batch range sort support

2020-04-23 Thread Benchao Li
Hi Kurt, I've created a jira issue[1] to track this, we can move further discussions to the jira issue. [1] https://issues.apache.org/jira/browse/FLINK-17354 Kurt Young 于2020年4月23日周四 下午10:25写道: > Hi Benchao, you can create a jira issue to track this. > > Best, > Kurt > > > On Thu, Apr 23, 2020

Re:Re: RuntimeException: Could not instantiate generated class 'StreamExecCalc$23166'

2020-04-23 Thread izual
I try to make the question-model simple, such as the code below: ``` val s = env.fromCollection(List( ("Book", 1, "") )) tableEnv.registerDataStream("tableA", s, 'a, 'b, 'c) class TestFunction extends ScalarFunction { def eval(data: String) = { println(s"test: ${

Flink table modules

2020-04-23 Thread Flavio Pompermaier
Hi to all, I've seen that table API provides modules. What are they exactly? Are they basically a way to group together a set of UDF functions? Or they can add other stuff to the table API? Best, Flavi

Re: Debug Slowness in Async Checkpointing

2020-04-23 Thread Robert Metzger
Hi Lu, were you able to resolve the issue with the slow async checkpoints? I've added Yu Li to this thread. He has more experience with the state backends to decide which monitoring is appropriate for such situations. Best, Robert On Tue, Apr 21, 2020 at 10:50 PM Lu Niu wrote: > Hi, Robert >

Re: Reading from sockets using dataset api

2020-04-23 Thread Arvid Heise
Hi Kaan, afaik there is no (easy) way to switch from streaming back to batch API while retaining all data in memory (correct me if I misunderstood). However, from your description, I also have some severe understanding problems. Why can't you dump the data to some file? Do you really have more ma

Re: KeyedStream and chained forward operators

2020-04-23 Thread Piotr Nowojski
Hi, I’m not sure how can we help you here. To my eye, your code looks ok, what you figured about pushing the keyBy in front of ContinuousFileReader is also valid and makes sense if you indeed can correctly perform the keyBy based on the input splits. The problem should be somewhere in your cust

Re: batch range sort support

2020-04-23 Thread Kurt Young
Hi Benchao, you can create a jira issue to track this. Best, Kurt On Thu, Apr 23, 2020 at 2:27 PM Benchao Li wrote: > Hi Jingsong, > > Thanks for your quick response. I've CC'ed Chongchen who understands the > scenario much better. > > > Jingsong Li 于2020年4月23日周四 下午12:34写道: > >> Hi, Benchao,

Re: define WATERMARKS in queries/views?

2020-04-23 Thread lec ssmi
can assignTimestampAndWatermark again on a watermarked table? Jark Wu 于 2020年4月23日周四 20:18写道: > Hi Matyas, > > You can create a new table based on the existing table using LIKE syntax > [1] in the upcoming 1.11 version, e.g. > > CREATE TABLE derived_table ( > WATERMARK FOR tstmp AS tsmp

Re: Processing Message after emitting to Sink

2020-04-23 Thread Sameer W
One idea that comes to my mind is to convert ProcessFunction1 with a CoProcessFunction[1]. The processElement1() function can send to side-output and process and maintain the business function message as State without emitting it. Then as Arvid mentioned processElement2() can listen on the side ou

Re: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory)

2020-04-23 Thread Arvid Heise
This looks like a typical issue with classloading. kinesis is probably residing in flink-dist/lib while woodstock is added in your job.jar (or vice versa). Could you try to use both jars in the same way? Alternatively, could you provide more information regarding your dependencies? On Tue, Apr 2

K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-23 Thread Averell
Hi, I am trying to deploy my job to Kubernetes following the native-Kubernetes guide. My job is checkpointing to S3 with RockDBStateBackend. It also has a S3 StreamingFileSink. In my jar file, I've already had /flink-hadoop-fs, flink-connector-filesystem, flink-s3-fs-hadoop /(as my understanding, t

JDBC table api questions

2020-04-23 Thread Flavio Pompermaier
Hi all, is there a way to get the list of existing views in a JDBC database? Is this something that could be supported somehow? Moreover, it would be interesting for us to also know the original field type of a table..is there a way to get it (without implementing a dedicated API)? Do you think it

Stateful Functions: java.lang.IllegalStateException: There are no routers defined

2020-04-23 Thread Annemarie Burger
Hi, I'm getting to know Stateful Functions and was trying to run the Harness RunnerTest example. If I clone the repository and open and execute the project from there it works fine, but when I copy the code into my own project, it keeps giving a "java.lang.IllegalStateException: There are no route

Re: Modelling time for complex events generated out of simple ones

2020-04-23 Thread Arvid Heise
We had a larger discussion on stackoverflow [1], so I'm adding a cross link if any other user is coming here first. [1] https://stackoverflow.com/questions/61309174/modelling-time-for-complex-events-generated-out-of-simple-ones/ On Mon, Apr 20, 2020 at 6:52 AM Salva Alcántara wrote: > In my cas

Re: RuntimeException: Could not instantiate generated class 'StreamExecCalc$23166'

2020-04-23 Thread Caizhi Weng
This plan looks indeed complicated, however it is hard to see what the SQL is doing as the plan is too long... Could you provide your SQL to us? Also, what version of Flink are you using? It seems that there is a very long method in the generated code, but Flink should have split it into many short

IntelliJ java formatter

2020-04-23 Thread Flavio Pompermaier
Hi to all, I'm migrating to IntelliJ because it's very complicated to have a fully working env in Eclipse (too many missing maven plugins). Is there a way to automatically format a Java class (respecting the configured checkstyle)? Or do I have to manually fix every Checkstyle problem? Thanks in a

Re: define WATERMARKS in queries/views?

2020-04-23 Thread Jark Wu
Hi Matyas, You can create a new table based on the existing table using LIKE syntax [1] in the upcoming 1.11 version, e.g. CREATE TABLE derived_table ( WATERMARK FOR tstmp AS tsmp - INTERVAL '5' SECOND ) LIKE base_table; For now, maybe you have to manually create a new table using full DD

Re: Unsubscribe

2020-04-23 Thread Arvid Heise
Please unsubscribe by sending a mail to user-unsubscr...@flink.apache.org On Thu, Apr 16, 2020 at 5:06 PM Jose Cisneros wrote: > Unsubscribe > -- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward -

Re: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-23 Thread Arvid Heise
Hi Elkhan, Theo's advice is spot-on, you should use asyncIO with AsyncFunction. AsyncIO is not performing any task asynchronously by itself though, so you should either use the async API of the library if existant or manage your own thread pool. The thread pool should be as large as your desired

Handling stale data enrichment

2020-04-23 Thread Vinay Patil
Hi, I went through Konstantin webinar on 99 ways you can do enrichment. One thing I am failing to understand is how do we efficiently handle stale data enrichment. Context: Let's say I want to enrich user data with the subscription data. Here subscription data is acting as reference data and will

Re: Streaming Job eventually begins failing during checkpointing

2020-04-23 Thread Stephan Ewen
If something requires Beam to register a new state each time, then this is tricky, because currently you cannot unregister states from Flink. @Yu @Yun I remember chatting about this (allowing to explicitly unregister states so they get dropped from successive checkpoints) at some point, but I coul

Re: Processing Message after emitting to Sink

2020-04-23 Thread Arvid Heise
Hi Kristoff, I see a few ways, none of which are perfect. The easiest way would be to not use a sink. Instead of outputting into a side-output, you could tag that element and have a successive asyncIO place that in RabbitMQ. If that asyncIO is ordered, then you can be sure that all following even

RuntimeException: Could not instantiate generated class 'StreamExecCalc$23166'

2020-04-23 Thread izual
Hi,Community: I add 4 complicated sqls in one job, and the job looks running well. But when I try to add 5th sql,the job failed at the beginning。 And throws errors info below: java.lang.RuntimeException: Could not instantiate generated class 'StreamExecCalc$23166' at org.apache.flink.table.

Fault tolerance in Flink file Sink

2020-04-23 Thread Eyal Pe'er
Hi all, I am using Flink streaming with Kafka consumer connector (FlinkKafkaConsumer) and file Sink (StreamingFileSink) in a cluster mode with exactly once policy. The file sink writes the files to the local disk. I've noticed that if a job fails and automatic restart is on, the task managers loo

Re: Flink 1.10 Out of memory

2020-04-23 Thread Stephan Ewen
@Xintong and @Lasse could it be that the JVM hits the "Direct Memory" limit here? Would increasing the "taskmanager.memory.framework.off-heap.size" help? On Mon, Apr 20, 2020 at 11:02 AM Zahid Rahman wrote: > As you can see from the task manager tab of flink web dashboard > > Physical Memory:3.8

Re: how to enable retract?

2020-04-23 Thread Benchao Li
FYI, the question has been answered in user-zh ML. lec ssmi 于2020年4月23日周四 下午2:57写道: > Hi: > Is there an aggregation operation or window operation, the result is > with retract characteristics? > > -- Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:

define WATERMARKS in queries/views?

2020-04-23 Thread Őrhidi Mátyás
Dear Community, is it possible to define WATERMARKS in SQL queries/views? We have a read only catalog implementation and we would like to assign WMs to the tables somehow. Thanks, Matyas

Re: A Strategy for Capacity Testing

2020-04-23 Thread Xintong Song
Hi Morgan, If I understand correctly, you mean you want to measure the max throughput that your Flink application can deal with given the certain resource setups? I think forcing Flink to catch-up the data should help on that. Please be aware that Flink may need a warming-up time for the performa

Re: Task Assignment

2020-04-23 Thread Marta Paes Moreira
Hi, Navneeth. If you *key* your stream using stream.keyBy(…), this will logically split your input and all the records with the same key will be processed in the same operator instance. This is the default behavior in Flink for keyed streams and transparently handled. You can read more about it i

A Strategy for Capacity Testing

2020-04-23 Thread Morgan Geldenhuys
Community, I am interested in knowing what is the recommended way of capacity planning a particular Flink application with current resource allocation. Taking a look at the Flink documentation (https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-pl