Re: Providing hdfs name node IP for streaming file sink

2020-02-28 Thread Nick Bendtner
To add to this question, do I need to setup env.hadoop.conf.dir to point to the hadoop config for instance env.hadoop.conf.dir=/etc/hadoop/ for the jvm ? Or is it possible to write to hdfs without any external hadoop config like core-site.xml, hdfs-site.xml ? Best, Nick. On Fri, Feb 28, 2020

Providing hdfs name node IP for streaming file sink

2020-02-28 Thread Nick Bendtner
Hi guys, I am trying to write to hdfs from streaming file sink. Where should I provide the IP address of the name node ? Can I provide it as a part of the flink-config.yaml file or should I provide it like this : final StreamingFileSink sink = StreamingFileSink

Timeout error in ZooKeeper

2020-02-28 Thread Samir Tusharbhai Chauhan
Hi, Yesterday morning I got below error in Zookeeper. After this error, my Flink did not connect to ZK and jobs went to hang state. I had to cancel and redeploy my all jobs to bring it to normal state. 2020-02-28 02:45:56,811 [myid:1] - WARN

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-28 Thread Hao Sun
Sounds good. Thank you! Hao Sun On Thu, Feb 27, 2020 at 6:52 PM Yang Wang wrote: > Hi Hao Sun, > > I just post the explanation to the user ML so that others could also have > the same problem. > > Gven the job graph is fetched from the jar, do we still need Zookeeper for >> HA? Maybe we still

Re: Apache Beam Side input vs Flink Broadcast Stream

2020-02-28 Thread Jin Yi
Hi Arvid, Thanks a lot for the response and yes I am aware of FLIP-17. Eleanore On Fri, Feb 28, 2020 at 2:16 AM Arvid Heise wrote: > Hi Eleanore, > > we understand side-input as something more general than simple broadcast > input, see FLIP-17 for details [1]. > > If a broadcast fits your use

Re: Apache Beam Side input vs Flink Broadcast Stream

2020-02-28 Thread Jin Yi
Hi Arvid, Thanks a lot for the response and yes I am aware of FLIP-17. Eleanore On Fri, Feb 28, 2020 at 2:16 AM Arvid Heise wrote: > Hi Eleanore, > > we understand side-input as something more general than simple broadcast > input, see FLIP-17 for details [1]. > > If a broadcast fits your use

Re: Best way to initialize a custom metric task for all task managers and job manager

2020-02-28 Thread Chesnay Schepler
A proper solution will required a custom Flink build, were you want to modify org.apache.flink.runtime.metrics.util.MetricUtils#instantiateProcessMetricGroup and org.apache.flink.runtime.metrics.util.MetricUtils#instantiateTaskManagerMetricGroup to add your custom metrics. This is where the

Best way to initialize a custom metric task for all task managers and job manager

2020-02-28 Thread Theo Diefenthal
Hi, >From my backend service, I appreciate to collect metrics about the log >messages, i.e. how many error and warn messages were printed over time, see >e.g. for Micrometer: [

Flink: Run Once Trigger feature like Spark's

2020-02-28 Thread Pankaj Chand
Hi all, Please tell me, is there anything in Flink that is similar to Spark's structured streaming Run Once Trigger (or Trigger.Oncefeature) as described in the blog below: https://databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html This feature allows you to call a

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-28 Thread David Magalhães
Hi Piotr, the typo was on writing the example here, not on the code it self. Regarding to the mix of Scala versions, I'm using 2.12 in every place. My Java version is 1.8.0_221. Currently it is working, but not sure what happened here. Thanks! On Fri, Feb 28, 2020 at 10:50 AM Piotr Nowojski

Re: [Native Kubernetes] [Ceph S3] Unable to load credentials from service endpoint.

2020-02-28 Thread Niels Basjes
Hi, As I mentioned in my original email I already verified that the endpoints were accessible from the pods, that was not the problem. It took me a while but I've figured out what went wrong. Setting the configuration like I did final Configuration conf = new Configuration();

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-02-28 Thread Piotr Nowojski
Yes, that’s correct. There shouldn’t be any data loss. Stop with savepoint is a solution to make sure, that if you are stopping a job (either permanently or temporarily) that all of the results are published/committed to external systems before you actually stop the job. If you just

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-02-28 Thread Kaymak, Tobias
Thank you! For understanding the matter: When I have a streaming pipeline (reading from Kafka, writing somewhere) and I click "cancel" and after that I restart the pipeline - I should not expect any data to be lost - is that correct? Best, Tobias On Fri, Feb 28, 2020 at 2:51 PM Piotr Nowojski

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-02-28 Thread Piotr Nowojski
Thanks for confirming that Yadong. I’ve created a ticket for that [1]. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-16340 > On 28 Feb 2020, at 14:32, Yadong Xie wrote: > > Hi > > 1. the old stop button was removed in flink 1.9.0

Re: Batch reading from Cassandra

2020-02-28 Thread Piotr Nowojski
Hi, I’m afraid that we don’t have any native support for reading from Cassandra at the moment. The only things that I could find, are streaming sinks [1][2]. Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/cassandra.html

Re: Flink remote batch execution in dynamic cluster

2020-02-28 Thread Piotr Nowojski
Hi, I guess it depends what do you have already available in your cluster and try to use that. Running Flink in existing Yarn cluster is very easy, but setting up yarn cluster in the first place even if it’s easy (I’m not sure about if that’s the case), would add extra complexity. When I’m

请教flink 的 Latency tracking 的问题

2020-02-28 Thread izual
问下flink开启latency tracking 的问题,请求restapi的返回里: latency.operator_id.cf155f65686cb012844f7c745ec70a3c.operator_subtask_index.0.latency_p99 这个 operator_id 怎么跟代码里的算子对上?支持自定义名字吗? 如果是纯 sql 的场景,有办法跟 metric 里的 name 对应上吗? name: Source: KafkaTableSource(...) -> SourceConversion(...) -> Calc(...) ->

Re: Flink 1.10 - Hadoop libraries integration with plugins and class loading

2020-02-28 Thread Piotr Nowojski
Hi, > Since we have "flink-s3-fs-hadoop" at the plugins folder and therefore being > dynamically loaded upon task/job manager(s) startup (also, we are keeping > Flink's default inverted class loading strategy), shouldn't Hadoop > dependencies be loaded by parent-first? (based on >

Re: How to set unorderedWait/orderedWait properties in Table API when using Async I/O

2020-02-28 Thread Zheng Steven
Thanks Jark and the un-ordered mode is useful in some cases. Jark Wu 于2020年2月28日周五 下午7:18写道: > Hi, > > The ordering in streaming SQL is very important, because the accumulate > and retract messages are emitted in order. > If messages are out of order, the result will be wrong. Think of you are

Re: How to set unorderedWait/orderedWait properties in Table API when using Async I/O

2020-02-28 Thread Jark Wu
Hi, The ordering in streaming SQL is very important, because the accumulate and retract messages are emitted in order. If messages are out of order, the result will be wrong. Think of you are applying an un-ordered changelog, the result will be non-deterministic. That's why we only support

Re: Artificial streaming benchmarks for Flink

2020-02-28 Thread Robert Metzger
Hey, This very old blog post contains some benchmarks: https://www.ververica.com/blog/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink I also found this: https://arxiv.org/pdf/1802.08496.pdf I'm not aware of anything recent. What I always recommend when it comes

Re: FsStateBackend vs RocksDBStateBackend

2020-02-28 Thread Robert Metzger
Sorry for the late reply. There's not much you can do at the moment, as Flink needs to sync on the checkpoint barriers. There's something in the making for addressing the issue soon: https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints Did you try out using the

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-28 Thread Piotr Nowojski
Also, don’t you have a typo in your pattern? In your pattern you are using `$accountId`, while the variable is `account_id`? (Maybe I don’t understand it as I don’t know Scala very well). Piotrek > On 28 Feb 2020, at 11:45, Piotr Nowojski wrote: > > Hey, > > What Java versions are you

Re: Scala string interpolation weird behaviour with Flink Streaming Java tests dependency.

2020-02-28 Thread Piotr Nowojski
Hey, What Java versions are you using? Also, could you check, if you are not mixing Scala versions somewhere? There are two different Flink binaries for Scala 2.11 and Scala 2.12. I guess if you mix them, of if you use incorrect Scala runtime not matching the supported version of the

Re: Async Datastream Checkpointing

2020-02-28 Thread Arvid Heise
Hi Alexandru, please share your code of the AsyncFunction. Your observed behaviour is completely not in line how things should behave. As long as you are not blocking AsyncFunction#asyncInvoke, checkpointing will work. On Fri, Feb 28, 2020 at 9:16 AM Alexandru Vasiu <

Fwd: How to set unorderedWait/orderedWait properties in Table API when using Async I/O

2020-02-28 Thread 郑泽辉
-- Forwarded message - 发件人: StevenZheng Date: 2020年2月28日周五 下午6:30 Subject: Re: How to set unorderedWait/orderedWait properties in Table API when using Async I/O To: Danny Chan Thanks Danny and I do run my lookupfunction in a single thread like this

Re: The main method caused an error: Unable to instantiate java compiler in Flink 1.10

2020-02-28 Thread Arvid Heise
Hi Kant, you should use compileOnly and then add the same dependency as a testImplementation. On Fri, Feb 28, 2020 at 10:54 AM Jingsong Li wrote: > Hi Kant, > > "We cannot use "compileOnly" or "shadow" configurations since then we > could not run code in the IDE or with "gradle run"." > > You

Re: invoking REST call at job startup

2020-02-28 Thread Arvid Heise
There is no clear recommendation. You should bundle whichever client you like or need. Common options are okhttp [1] or httpcomponents [2]. In the easiest case, you could also just use java's URL to send that one request and avoid a new dependency. [1] https://github.com/square/okhttp [2]

Re: Apache Beam Side input vs Flink Broadcast Stream

2020-02-28 Thread Arvid Heise
Hi Eleanore, we understand side-input as something more general than simple broadcast input, see FLIP-17 for details [1]. If a broadcast fits your use case, you can use that of course. We are aiming for something, where a side input can also be co-partitioned. We are currently laying the

Re: Getting javax.management.InstanceAlreadyExistsException when upgraded to 1.10

2020-02-28 Thread Khachatryan Roman
Hi John, Sorry for the late reply. I'd assume that this is a separate issue. Regarding the original one, I'm pretty sure it's https://issues.apache.org/jira/browse/FLINK-8093 Regards, Roman On Wed, Feb 26, 2020 at 5:50 PM John Smith wrote: > Just curious is this the reason why also some

Re: The main method caused an error: Unable to instantiate java compiler in Flink 1.10

2020-02-28 Thread Jingsong Li
Hi Kant, "We cannot use "compileOnly" or "shadow" configurations since then we could not run code in the IDE or with "gradle run"." You can take a look to document [1]. There are project templates for Java. [1]

Flink remote batch execution in dynamic cluster

2020-02-28 Thread Antonio Martínez Carratalá
Hello I'm working on a project with Flink 1.8. I'm running my code from Java in a remote Flink as described here https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/cluster_execution.html . That part is working, but I want to configure a dynamic Flink cluster to execute the jobs

Re: The main method caused an error: Unable to instantiate java compiler in Flink 1.10

2020-02-28 Thread kant kodali
Hi Jark, You mean I shouldn't package them into the jar so I need to specify them as compileOnly as Lake Shen pointed out? because I still need them to use it in my IDE/compile my application. just tried it and yes it works below is updated build.gradle buildscript { repositories {

Re: The main method caused an error: Unable to instantiate java compiler in Flink 1.10

2020-02-28 Thread Jark Wu
Hi Kant, You shouldn't compile `flink-table-planner` or `flink-table-planner-blink` into your user jar. They have been provided by Flink cluster. Best, Jark On Fri, 28 Feb 2020 at 15:28, kant kodali wrote: > Here is my build.gradle and I am not sure which jar uses >

Re: Flink 1.10 exception : Unable to instantiate java compiler

2020-02-28 Thread Till Rohrmann
Hi, with Flink 1.10 we changed the behaviour on the client side so that it also uses the child first class loader [1]. Due to that it might be the case that you have some conflicting dependencies bundled in your user code jar which don't play well together with what you have on the system class

Re: Flink 1.10 exception : Unable to instantiate java compiler

2020-02-28 Thread LakeShen
I have solved this problem. I set the flink-table-planner-blink maven scope to provided . kant kodali 于2020年2月28日周五 下午3:32写道: > Same problem! > > On Thu, Feb 27, 2020 at 11:10 PM LakeShen > wrote: > >> Hi community, >> now I am using the flink 1.10 to run the flink task >>

Re: Emit message at start and end of event time session window

2020-02-28 Thread Till Rohrmann
Great to hear that you solved the problem. Let us know if you run into any other issues. Cheers, Till On Fri, Feb 28, 2020 at 8:08 AM Manas Kale wrote: > Hi, > This problem is solved[1]. The issue was that the BroadcastStream did not > contain any watermark, which prevented watermarks for any

Re: Async Datastream Checkpointing

2020-02-28 Thread Alexandru Vasiu
Hi, That's how we used the executor. I think the problem is that the web requests took too long time to complete (3-4 seconds) because the requests are using a proxy server. I also transformed the asyncDataStream using a flatMap and same issue (no successfull checkpoint). If I used a simple web