Why ListState of flink don't support update?

2017-08-16 Thread yunfan123
If I want to update the list. I have to do two steps: listState.clear() for (Element e : myList) { listState.add(e); } Why not I update the state by: listState.update(myList) ? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-ListStat

Re:Re:Re:Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-16 Thread mingleizhang
Ahhh. Sorry Ted. I didnt see the code was broken. Yep, I will directly put the text code here. Dependency is com.googlecode.protobuf-java-format protobuf-java-format 1.2 And the adding code is like following. This time, although I sink an object to Elasticsearch, I convert it to a JSO

Re: Flink multithreading, CoFlatMapFunction, CoProcessFunction, internal state

2017-08-16 Thread Chao Wang
Thank you! Nico. That helps me a lot! 2a) That really clarifies my understanding about Flink. Yes, I think I have used static references, since I invoked a native function (implemented through JNI) which I believe only has one instance per process. And I guess the reason why those Java synchro

Exception for Scala anonymous class when restoring from state

2017-08-16 Thread Kien Truong
Hi, After some refactoring: moving some operator to separate functions/file, I'm encountering a lot of exceptions like these. The logic of the application did not change, and all the refactored operators are stateless, e.g simple map/flatmap/filter. Does anyone know how to fix/avoid/work aroun

Re: hadoop

2017-08-16 Thread Ted Yu
Can you check the following config in yarn-site.xml ? yarn.resourcemanager.proxy-user-privileges.enabled (true) Cheers On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli wrote: > > > Hi, > > > > I triggered an flink yarn-session on a running Hadoop cluster… and > triggering streaming application

Re: hadoop

2017-08-16 Thread Will Du
Is the kerberos token expired without renewing? > On Aug 16, 2017, at 7:48 PM, Raja.Aravapalli > wrote: > > > Hi, > > I triggered an flink yarn-session on a running Hadoop cluster… and triggering > streaming application on that. > > But, I see after few days of running without any issues

Re:Re:Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-16 Thread Ted Yu
Did you use image for the code ?Can you send plain code again ?Cheers Original message From: mingleizhang <18717838...@163.com> Date: 8/16/17 6:16 PM (GMT-08:00) To: mingleizhang <18717838...@163.com> Cc: "Tzu-Li (Gordon) Tai" , user@flink.apache.org Subject: Re:Re:Re:Re:How t

Re:Re:Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-16 Thread mingleizhang
I solved the issue by adding a dependency that convert the protobuf objects into a JSON. By adding a line of code like below: element is a PB object. Thanks. zhangminglei At 2017-08-16 22:52:30, "mingleizhang" <18717838...@163.com> wrote: I looked into the sinked data which in ElasticSea

Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-16 Thread Raja . Aravapalli
Thanks very much for the detailed explanation Stefan. Regards, Raja. From: Stefan Richter Date: Monday, August 14, 2017 at 7:47 AM To: Raja Aravapalli Cc: "user@flink.apache.org" Subject: Re: [EXTERNAL] difference between checkpoints & savepoints Just noticed that I forgot to include also a

hadoop

2017-08-16 Thread Raja . Aravapalli
Hi, I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that. But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception. Caused by: org.apache.hadoop.ipc.RemoteE

Re: Access to datastream from BucketSink- RESOLVED

2017-08-16 Thread ant burton
I have resolved my issue, thank you for your help. The following code give me access to an element to determine a bucket directory name. import org.apache.hadoop.fs.Path; import org.apache.flink.streaming.connectors.fs.bucketing.Bucketer; import org.apache.flink.streaming.connectors.fs.Clock;

Re: Access to datastream from BucketSink

2017-08-16 Thread ant burton
Thank you for your help it’s greatly appreciated. My aim is to be able “ use a property of the element to determine the bucket directory” With your suggestions, this is what I have so far, its obviously wrong, I hope I’m getting closer. Is it correct to still implement Bucketer, just change wh

Re: Reload DistributedCache file?

2017-08-16 Thread Ted Yu
For hdfs, there is iNotify mechanism. https://issues.apache.org/jira/browse/HDFS-6634 https://www.slideshare.net/Hadoop_Summit/keep-me-in-the-loop-inotify-in-hdfs FYI On Wed, Aug 16, 2017 at 9:41 AM, Conrad Crampton < conrad.cramp...@secdata.com> wrote: > Hi, > > I have a simple text file that

Reload DistributedCache file?

2017-08-16 Thread Conrad Crampton
Hi, I have a simple text file that is stored in HDFS which I use in a RichFilterFunction by way of DistributedCache file. The file is externally edited periodically to have other lines added to it. My FilterFunction also implements Runnable whose run method is run as a scheduleAtFixedRate method

Re: stream partitioning to avoid network overhead

2017-08-16 Thread Karthik Deivasigamani
Thanks Urs for your inputs. Yes we use AsyncIO operator for our webservice calls. We were considering increasing the kafka partitions and increasing the parallelism on the source to match the webservice operator. Wasn't quite sure if this was the only way to achieve operator chaining. Thanks for

JobManager HA behind load balancer

2017-08-16 Thread Shannon Carey
Is anyone running multiple JobManagers (in High Availability mode) behind a load balancer such as an AWS ELB or a software proxy such as HAProxy or Nginx? Right now, it appears that server-side redirects that come from the JobManager Web UI use the internal IP address of the JobManager (from Akk

Re: Access to datastream from BucketSink

2017-08-16 Thread Kostas Kloudas
Hi Ant, I think you are implementing the wrong Bucketer. This seems to be the one for the RollingSink which is deprecated. Is this correct? You should implement the BucketingSink one, which is in the package: org.apache.flink.streaming.connectors.fs.bucketing That one requires the implementat

Re:Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-16 Thread mingleizhang
I looked into the sinked data which in ElasticSearch. Good news I can found it is really right there. But but, I sinked the data is an object. But the Elasticsearch represent it as a string. I put the related code below. element type is an ActivityInfo. then, I wrote a java api to read the data

Re: Change state backend.

2017-08-16 Thread Ted Yu
I guess shashank meant switching state backend w.r.t. savepoints. On Wed, Aug 16, 2017 at 4:00 AM, Biplob Biswas wrote: > Could you clarify a bit more? Do you want an existing state on a running > job > to be migrated from FsStateBackend to RocksDbStateBackend? > > Or > > Do you have the option

Re: Access to datastream from BucketSink

2017-08-16 Thread ant burton
Thanks Kostas, I’m narrowing in on a solution: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/filesystem_sink.html says "You can also specify a custom bucketer by usin

Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-16 Thread mingleizhang
Hi, Gordon. I am not sure about this, as far as I know. ElasticSearch often store JSON data inside it as it is convenient to create it's index. As refers to my code below, I stored the protobuf objects (ActivityInfo which build from activityinfo.proto file) in ElasticSearch. And it is

Re: Access to datastream from BucketSink

2017-08-16 Thread Kostas Kloudas
In the second link for the BucketingSink, you can set your own Bucketer using the setBucketer method. You do not have to implement your own sink from scratch. Kostas > On Aug 16, 2017, at 1:39 PM, ant burton wrote: > > or rather > https://ci.apache.org/projects/flink/flink-docs-release-1.3/a

Re: Access to datastream from BucketSink

2017-08-16 Thread ant burton
or rather https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html > On

Re: Access to datastream from BucketSink

2017-08-16 Thread ant burton
I am I on the right path with the following: class S3SinkFunc implements SinkFunction { public void invoke(String element) { System.out.println(element); // don't have access to dataStream to call .addSink() :-( } } Thanks, > On 16 Aug 2017, at 12:24, Kostas Kloudas wro

Re: Aggregation by key hierarchy

2017-08-16 Thread Basanth Gowda
Thanks Nico. As there are 2 ways to achieve this which is better ? 1st option -> dataStream.flatMap( ... ) -> this takes in out and provides me N number of outputs, depending on my key combination . On each of the output the same windowing logic is applied or the one you suggested 2nd option ->

Re: Access to datastream from BucketSink

2017-08-16 Thread Kostas Kloudas
Hi Ant, I think you can do it by implementing your own Bucketer. Cheers, Kostas . > On Aug 16, 2017, at 1:09 PM, ant burton wrote: > > Hello, > > Given > >// Set StreamExecutionEnvironment >final StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvi

Access to datastream from BucketSink

2017-08-16 Thread ant burton
Hello, Given // Set StreamExecutionEnvironment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // Set checkpoints in ms env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); // Add source (input stre

Re: Change state backend.

2017-08-16 Thread Biplob Biswas
Could you clarify a bit more? Do you want an existing state on a running job to be migrated from FsStateBackend to RocksDbStateBackend? Or Do you have the option of restarting your job after changing existing code? -- View this message in context: http://apache-flink-user-mailing-list-

Re: Question about Global Windows.

2017-08-16 Thread Nico Kruber
Hi Steve, are you sure a GlobalWindows assigner fits your needs? This may be the case if all your events always come in order and you do not ever have overlapping sessions since a GlobalWindows assigner simply puts all events (per key) into a single window (per key). If you have overlapping sess

Change state backend.

2017-08-16 Thread shashank agarwal
Hi, Can i change State backend from FsStateBackend to RocksDBStateBackend directly or i have to do some migration ? -- Thanks Regards SHASHANK AGARWAL --- Trying to mobilize the things

Re: Aggregation by key hierarchy

2017-08-16 Thread Nico Kruber
[back to the ml...] also including your other mail's additional content... > I have been able to do this by the following and repeating this for every > key + window combination. So in the above case there would be 8 blocks like > below. (4 combinations and 2 window period for each combination) >

Re: Flink multithreading, CoFlatMapFunction, CoProcessFunction, internal state

2017-08-16 Thread Nico Kruber
Hi Chao, 1) regarding the difference of CoFlatMapFunction/CoProcessFunction, let me quote the javadoc of the CoProcessFunction: "Contrary to the {@link CoFlatMapFunction}, this function can also query the time (both event and processing) and set timers, through the provided {@link Context}. Wh

Re: Time zones problem

2017-08-16 Thread Biplob Biswas
Hi Alex, Your problem sounds interesting and I have always found dealing with timestamps cumbersome. Nevertheless, what I understand is that your start and end timsstamp for American and European customers are based on their local clock. For ex the start and end timestamp of 12 AM - 12 AM in am