How to get taskmanager hostname and port on runtime

2018-09-12 Thread 郑舒力
Hello community, Is there a way to get taskmanager hostname and port on runtime? I’d like to implement a kafka metric reporter, reporter should be able to report the taskmanager hostname and port to monitoring system. Thanks.

Re: Question regarding Streaming Resources

2018-09-12 Thread Ken Krugler
Hi Bhaskar, > On 2018/09/12 20:42:22, Ken Krugler wrote: >> Hi Bhaskar, >> >> I assume you don’t have 1000 streams, but rather one (keyed) stream with >> 1000 different key values, yes? >> >> If so, then this one stream is physically partitioned based on the >> parallelism of the operator

Logging metrics from within Elasticsearch ActionRequestFailureHandler

2018-09-12 Thread Averell
Good day everyone, I'm writing to Elasticsearch, and I need to count the number of records that the process failed to write. The problem that I'm facing is there is no RunningContext that I can access from within o.a.f.s.c.elasticsearch.ActionRequestFailureHandler's onFailure method so that I can

Re: Question regarding Streaming Resources

2018-09-12 Thread bhaskar . ebay77
On 2018/09/12 20:42:22, Ken Krugler wrote: > Hi Bhaskar, > > I assume you don’t have 1000 streams, but rather one (keyed) stream with 1000 > different key values, yes? > > If so, then this one stream is physically partitioned based on the > parallelism of the operator following the

Test harness for validating proper checkpointing of custom SourceFunction

2018-09-12 Thread Ken Krugler
Hi all, We’re using the (Keyed)(One|Two)InputStreamOperatorTestHarness classes to test checkpointing of some custom functions. But in looking through the Flink source, I didn’t see anything comparable for testing a custom SourceFunction (which implements the ListCheckpointed interface).

Re: Question regarding Streaming Resources

2018-09-12 Thread Ken Krugler
Hi Bhaskar, I assume you don’t have 1000 streams, but rather one (keyed) stream with 1000 different key values, yes? If so, then this one stream is physically partitioned based on the parallelism of the operator following the keyBy(), not per unique key. The most common per-key “resource” is

Broadcast managed state

2018-09-12 Thread Deepya
I am using Flink to process Streams. There is a MapValued Managed state that I am using to store custom state. I notice that the state is not shared across the Task Managers. Is there a way to Broadcast Managed state? Thanks, Deepya. -- Sent from:

Re: Question regarding Streaming Resources

2018-09-12 Thread bhaskar . ebay77
On 2018/09/12 16:55:09, bhaskar.eba...@gmail.com wrote: > Hi > > I have created a KeyedStream with state as explained below > For example i have created 1000 streams, out of which 50% of streams data is > going to come once in 8 hours. Will the resources of these under utilized > streams

Question regarding Streaming Resources

2018-09-12 Thread bhaskar . ebay77
Hi I have created a KeyedStream with state as explained below For example i have created 1000 streams, out of which 50% of streams data is going to come once in 8 hours. Will the resources of these under utilized streams are idle for that duration? Or Flink internal task manager is having

Weird behaviour after change sources in a job.

2018-09-12 Thread Juan Gentile
Hello! We have found a weird issue while replacing the source in one of our Flink SQL Jobs. We have a job which was reading from a Kafka topic (with externalize checkpoints) and we needed to change the topic while keeping the same logic for the job/SQL. After we restarted the job, instead of

Task managers run on separate nodes in a cluster

2018-09-12 Thread Martin Eden
Hi all, We're using Flink 1.3.2 with DCOS / Mesos. We have a 3 node cluster and are running the Flink DCOS package (Flink Mesos framework) configured with 3 Task Managers. Our goal is to run each of them on separate hosts for better load balancing but it seems the task managers end up running

Re: How does flink read a DataSet?

2018-09-12 Thread Taher Koitawala
Thanks a lot! For your explanation i am much clearer. However for my reference can you give me links of some documentations for flink Dataset and DataStream which clearly and in detail explain all the internals right from reading to processing etc etc. The flink landing page doesn't have in depth

Re: How does flink read a DataSet?

2018-09-12 Thread Fabian Hueske
The InputFormat interface is similar to Hadoop MapReduce's. Data is emitted record-by-record, but InputFormats can read larger blocks for better efficiency (e.g., for ORC or Parquet files). In general, Flink tries to push data forward as early as possible and avoids collecting records in memory

Re: How does flink read a DataSet?

2018-09-12 Thread Taher Koitawala
So flink TMs reads one line at a time from hdfs in parallel and keep filling it in memory and keep passing the records to the next operator? I just want to know how data comes in memory? How it is partition between TMs Is there a documentation i can refer how the reading is done and how data is

Re: How does flink read a DataSet?

2018-09-12 Thread Fabian Hueske
Actually, some parts of Flink's batch engine are similar to streaming as well. If the data does not need to be sorted or put into a hash-table, the data is pipelined (like in many relational database systems). For example, if you have a job that joins two inputs with a HashJoin, only the build

How to clear keyed states periodically?

2018-09-12 Thread Paul Lam
Hi, I’m using MapState to deduplicate some ids and the MapState needs to be truncated periodically. I tried to use ProcessingTimeCallback to call state.clear(), but in this way I can only clear the state for one key, and actually I need a key group level cleanup. So I’m wondering is there any

Re: What is the right way to add classpath?

2018-09-12 Thread bupt_ljy
Hi,Yun Tang Thanks for help. The first option makes the package process heavy, the second will make a change to flink’s lib folder. And the -yt cannot help also, because I need these dependencies before it’s submitted on yarn, and I did use -yt to submit my job and failed. Best, Jiayi Liao

Re: What is the right way to add classpath?

2018-09-12 Thread Yun Tang
Hi Jiayi As far as I know, there exist three ways: 1. Build the fat-application jar with dependencies using maven-shade-plugin or maven-assembly-plugin. 2. Copy the dependency jars to local ${FLINK_HOME}/lib folder. 3. Submit the job with -yt,--yarnship command, please refer to

Re: Problem with querying state on Flink 1.6.

2018-09-12 Thread Kostas Kloudas
Hi Joe, And it would help a lot if you could share a bit more details about your setup and the code of your job or a minimal example that can reproduce it. Thanks, Kostas > On Sep 12, 2018, at 9:59 AM, Till Rohrmann wrote: > > Hi Joe, > > what is the current problem you are facing? > >

Re: aggregate does not allow RichAggregateFunction ?

2018-09-12 Thread chiggi_dev
Hi Fabian, We came across this issue while working on RichAggregateFunction. Isnt generic state mergeable, similar to ACC merge? What if I need the Flink classLoader in the Aggregate function? Is there anyway I can do that without RuntimeContext? Thanks, Chirag -- Sent from:

Flink Checkpointing in production

2018-09-12 Thread Ahmad Hassan
Hi All, We need two clarifications for using Flink 1.6.0. We have flink jobs running to handle 100's of tenants with sliding window of 24hrs and slide by 5 minutes. 1) If checkpointing is enabled and flink job crashes in the middle of spitting out results to kafka producer. Then if the job

What is the right way to add classpath?

2018-09-12 Thread bupt_ljy
Hi,all My program needs some dependencies before it’s submitted to yarn. Like: ``` stream.filter(new FilterService()).print() env.execute() ``` I use external dependency inFilterService, and the program reports NoClassDefFoundError at

Re: Problem with querying state on Flink 1.6.

2018-09-12 Thread Till Rohrmann
Hi Joe, what is the current problem you are facing? Cheers, Till On Wed, Sep 12, 2018 at 12:18 AM Joe Olson wrote: > Kostas - Till's advice got me past my first problem. I'm still having > issues with the client side. I've got your example code from [1] in a > github project [2]. > > My

Re: ElasticSearch Checkpointing taking too much time

2018-09-12 Thread shashank734
-- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearch Checkpointing taking too much time

2018-09-12 Thread shashank734
Hi, vino, I have tried bot HDFS and filesystem and other checkpoints completed successfully so access is not the issue. For debug mode, I have to restart the app. I'll check and let you know thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearch Checkpointing taking too much time

2018-09-12 Thread shashank734
Hi Hequn, Actually there are no error logs and to turn on debug mode I have to restart the app, Actually, I am using around 25-30 operators all others are completing successfully in less time only elastic search sink is taking too much time. I am using around 6 Elastic search sinks all are

Re: How does flink read a DataSet?

2018-09-12 Thread vino yang
Hi Taher, Stream processing and batch processing are very different. The principle of batch processing determines that it needs to process bulk data, such as memory-based sorting, join, and so on. So, in this case, it needs to wait for the relevant data to arrive before it is calculated, but this