Re: debug flink in intelliJ on EMR

2020-01-27 Thread Arvid Heise
Hi Fanbin, the host should be the external IP of your master node. However, usually the port 5005 is not open in EMR (you could any other open, non-used port). Alternatively, you could use a SSH port forwarding [1]: ssh -L ::5005 And then connect to localhost: in your IDE. [1]

Re: Blocking KeyedCoProcessFunction.processElement1

2020-01-27 Thread Yun Tang
Hi Alexey If possible, I think you could move some RDBMS maintenance operations to the #open method of RichFunction to reduce the possibility of blocking processing records. Best Yun Tang From: Alexey Trenikhun Sent: Tuesday, January 28, 2020 15:15 To: Yun

Re: Flink RocksDB logs filling up disk space

2020-01-27 Thread Yun Tang
Hi Ahmad Apart from setting the logger level of RocksDB, I also wonder why you would meet rocksdb checkpoint IO logs were filling up disk space very very quickly. How larger the local checkpoint state is and how long the checkpoint interval is? I think you might give a too short interval of

Re: Is there anything strictly special about sink functions?

2020-01-27 Thread Konstantin Knauf
Hi Andrew, as far as I know there is nothing particularly special about the sink in terms of how it handles state or time. You can not leave the pipeline "unfinished", only sinks trigger the execution of the whole pipeline. Cheers, Konstantin On Fri, Jan 24, 2020 at 5:59 PM Andrew Roberts

Re: Flink and Presto integration

2020-01-27 Thread Jingsong Li
Hi Flavio, Your requirement should be to use blink batch to read the tables in Presto? I'm not familiar with Presto's catalog. Is it like hive Metastore? If so, what needs to be done is similar to the hive connector. You need to implement a catalog of presto, which translates the Presto table

Re: is Flink supporting pre-loading of a compacted (reference) topic for a join ?

2020-01-27 Thread Tzu-Li Tai
Hi Dominique, FLIP-17 (Side Inputs) is not yet implemented, AFAIK. One possible way to overcome this right now if your reference data is static and not continuously changing, is to use the State Processor API to bootstrap a savepoint with the reference data. Have you looked into that and see if

Re: batch job OOM

2020-01-27 Thread Fanbin Bu
I can build flink 1.10 and install it on to EMR (flink-dist_2.11-1.10.0.jar). but what about other dependencies in my project build.gradle, ie. flink-scala_2.11, flink-json, flink-jdbc... do I continue to use 1.9.0 since there is no 1.10 available? Thanks, Fanbin On Fri, Jan 24, 2020 at 11:39 PM

Re: BlinkPlanner limitation related clarification

2020-01-27 Thread RKandoji
Hi Jingsong, Thanks for the clarification! The limitation description is a bit confusing to me but it was clear after seeing the above example posted by you. Regards, RK. On Mon, Jan 27, 2020 at 6:25 AM Jingsong Li wrote: > Hi RKandoji, > > You understand this bug wrong, your code will not

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Jin Yi
Hi Yun, Thanks for the suggestion! Best Eleanore On Mon, Jan 27, 2020 at 1:54 AM Yun Tang wrote: > Hi Yi > > Glad to know you have already resolved it. State process API would use > data stream API instead of data set API in the future [1]. > > Besides, you could also follow the guide in "the

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Jin Yi
Hi Yun, Thanks for the suggestion! Best Eleanore On Mon, Jan 27, 2020 at 1:54 AM Yun Tang wrote: > Hi Yi > > Glad to know you have already resolved it. State process API would use > data stream API instead of data set API in the future [1]. > > Besides, you could also follow the guide in "the

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-27 Thread Benoît Paris
Dang what a massive PR: Files changed2,118, +104,104 −29,161 lines changed. Thanks for the details, Jark! On Mon, Jan 27, 2020 at 4:07 PM Jark Wu wrote: > Hi Kant, > Having a custom state backend is very difficult and is not recommended. > > Hi Benoît, > Yes, the "Query on the intermediate

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread Piotr Nowojski
Hi, I think reducing the frequency of the checkpoints and decreasing parallelism of the things using the S3AOutputStream class, would help to mitigate the issue. I don’t know about other solutions. I would suggest to ask this question directly to Steve L. in the bug ticket [1], as he is the

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-27 Thread Jark Wu
Hi Kant, Having a custom state backend is very difficult and is not recommended. Hi Benoît, Yes, the "Query on the intermediate state is on the roadmap" I mentioned is referring to integrate Table API & SQL with Queryable State. We also have an early issue FLINK-6968 to tracks this. Best, Jark

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread Cliff Resnick
I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

Re: Flink and Presto integration

2020-01-27 Thread Itamar Syn-Hershko
Yes, Flink does batch processing by "reevaluating a stream" so to speak. Presto doesn't have sources and sinks, only catalogs (which are always allowing reads, and sometimes also writes). Presto catalogs are a configuration - they are managed on the node filesystem as a configuration file and

Re: Flink and Presto integration

2020-01-27 Thread Flavio Pompermaier
Both Presto and Flink make use of a Catalog in order to be able to read/write data from a source/sink. I don't agree about " Flink is about processing data streams" because Flink is competitive also for the batch workloads (and this will be further improved in the next releases). I'd like to

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread David Magalhães
Does StreamingFileSink use core-site.xml ? When I was using it, it didn't load any configurations from core-site.xml. On Mon, Jan 27, 2020 at 12:08 PM Mark Harris wrote: > Hi Piotr, > > Thanks for the link to the issue. > > Do you know if there's a workaround? I've tried setting the following

Re: Flink and Presto integration

2020-01-27 Thread Itamar Syn-Hershko
Hi Flavio, Presto contributor and Starburst Partners here. Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources. A typical architecture would use Flink to process data

Re: Flink RocksDB logs filling up disk space

2020-01-27 Thread Ahmad Hassan
Thanks Chesnay! On Mon, 27 Jan 2020 at 11:29, Chesnay Schepler wrote: > Please see https://issues.apache.org/jira/browse/FLINK-15068 > > On 27/01/2020 12:22, Ahmad Hassan wrote: > > Hello, > > In our production systems, we see that flink rocksdb checkpoint IO logs > are filling up disk space

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread Mark Harris
Hi Piotr, Thanks for the link to the issue. Do you know if there's a workaround? I've tried setting the following in my core-site.xml: ​fs.s3a.fast.upload.buffer=true To try and avoid writing the buffer files, but the taskmanager breaks with the same problem. Best regards, Mark

Flink and Presto integration

2020-01-27 Thread Flavio Pompermaier
Hi all, is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise? Best, Flavio

Re: Flink RocksDB logs filling up disk space

2020-01-27 Thread Chesnay Schepler
Please see https://issues.apache.org/jira/browse/FLINK-15068 On 27/01/2020 12:22, Ahmad Hassan wrote: Hello, In our production systems, we see that flink rocksdb checkpoint IO logs are filling up disk space very very quickly in the order of GB's as the logging is very verbose. How do we

Re: BlinkPlanner limitation related clarification

2020-01-27 Thread Jingsong Li
Hi RKandoji, You understand this bug wrong, your code will not go wrong. The bug is: TableEnv tEnv = TableEnv.create(...); Table t1 = tEnv.sqlQuery(...); tEnv.insertInto("sink1", t1); tEnv.execute("job1"); Table t2 = tEnv.sqlQuery(...); tEnv.insertInto("sink2", t2); tEnv.execute("job2"); This

Flink RocksDB logs filling up disk space

2020-01-27 Thread Ahmad Hassan
Hello, In our production systems, we see that flink rocksdb checkpoint IO logs are filling up disk space very very quickly in the order of GB's as the logging is very verbose. How do we disable or suppress these logs please ? The rocksdb file checkpoint.cc is dumping huge amount of checkpoint

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Yun Tang
Hi Yi Glad to know you have already resolved it. State process API would use data stream API instead of data set API in the future [1]. Besides, you could also follow the guide in "the brodcast state pattern"[2] // a map descriptor to store the name of the rule (string) and the rule itself.

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Yun Tang
Hi Yi Glad to know you have already resolved it. State process API would use data stream API instead of data set API in the future [1]. Besides, you could also follow the guide in "the brodcast state pattern"[2] // a map descriptor to store the name of the rule (string) and the rule itself.