Re: Questions about checkpoints/savepoints

2017-10-23 Thread vipul singh
Thanks Tony, that was the issue. I was thinking that when we use Rocksdb and provide an s3 path, it uses externalized checkpoints by default. Thanks so much! I have one followup question. Say in above case, I terminate the cluster, and since the metadata is on s3, and not on local storage, does fl

Re: Get EOF from PrometheusReporter in JM

2017-10-23 Thread Tony Wei
Hi Max, Good to know. Thanks very much. Best Regards, Tony Wei 2017-10-24 13:52 GMT+08:00 Maximilian Bode : > Hi Tony, > > thanks for troubleshooting this. I have added a commit to > https://github.com/apache/flink/pull/4586 that should enable you to use > the reporter with 1.3.2 as well. > > B

Re: Questions about checkpoints/savepoints

2017-10-23 Thread Tony Wei
Hi, Did you enable externalized checkpoints? [1] Best, Tony Wei [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/checkpoints.html#externalized-checkpoints 2017-10-24 13:07 GMT+08:00 vipul singh : > Thanks Aljoscha for the answer above. > > I am experimenting with savepoint

Re: Get EOF from PrometheusReporter in JM

2017-10-23 Thread Maximilian Bode
Hi Tony, thanks for troubleshooting this. I have added a commit to https://github.com/apache/flink/pull/4586 that should enable you to use the reporter with 1.3.2 as well. Best regards, Max > Tony Wei > 23. September 2017 um 13:11 > Hi Chesnay, > > I built another

Re: Local combiner on each mapper in Flink

2017-10-23 Thread Le Xu
Thanks Kurt. Maybe I wasn't clear before, I was wondering if Flink has implementation of combiner in DataStream (to use after keyBy and windowing). Thanks again! Le On Sun, Oct 22, 2017 at 8:52 PM, Kurt Young wrote: > Hi, > > The document you are looking at is pretty old, you can check the new

Re: Writing an Integration test for flink-metrics

2017-10-23 Thread Colin Williams
Thanks for the help. I ended up creating a custom metric reporter and accessing it's fields in an integration test. However I do think that the test that Martin checked in is another good way to test. I opened https://issues.apache.org/jira/browse/FLINK-7907 regarding the missing Scala examples in

Re: Questions about checkpoints/savepoints

2017-10-23 Thread vipul singh
Thanks Aljoscha for the answer above. I am experimenting with savepoints and checkpoints on my end, so that we built fault tolerant application with exactly once semantics. I have been able to test various scenarios, but have doubts about one use case. My app is running on an emr cluster, and I

Re: Processing files

2017-10-23 Thread Sugandha Amatya
I think you need to use readfile. On Tue, Oct 24, 2017 at 12:32 AM, Telco Phone wrote: > All, > > Im looking to process files in a directory based on files that are coming > in via file transfer. > > The files are renamed once the transfer is done to a .DONE. > > These are binary files and I nee

Re: ResultPartitionMetrics

2017-10-23 Thread aitozi
hi, i have understood it. thanks, aitozi -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Impersonation support in Flink

2017-10-23 Thread Eron Wright
Hello, Flink does initialize the process-wide login user, using the UGI's Kerberos login method. It doesn't support proxy user at the moment. Let's dig into the scenario a bit to see how best to support it. As you know, the proxy user functionality of Hadoop allows a process that has superuser

Re: Write each group to its own file

2017-10-23 Thread Piotr Nowojski
You’re welcome :) > On 23 Oct 2017, at 20:43, Rodrigo Lazoti wrote: > > Piotr, > > I did as you suggested and it worked perfectly. > Thank you! :) > > Best, > Rodrigo > > On Thu, Oct 12, 2017 at 5:11 AM, Piotr Nowojski > wrote: > Hi, > > There is no straightf

Processing files

2017-10-23 Thread Telco Phone
All, Im looking to process files in a directory based on files that are coming in via file transfer. The files are renamed once the transfer is done to a .DONE. These are binary files and I need to process billions per day. What I want to do is process the file and then create a new file called .

Re: Write each group to its own file

2017-10-23 Thread Rodrigo Lazoti
Piotr, I did as you suggested and it worked perfectly. Thank you! :) Best, Rodrigo On Thu, Oct 12, 2017 at 5:11 AM, Piotr Nowojski wrote: > Hi, > > There is no straightforward way to do that. First of all, the error you > are getting is because you are trying to start new application ( > env.f

Impersonation support in Flink

2017-10-23 Thread Chan, Regina
Hi folks, Is Flink is able to do impersonation using UserGroupInformation? How do we make all the tasks run with this in a way that we wouldn't have to do it per task? UserGroupInformation ugi = UserGroupInformation.createProxyUser( proxyUser, UserGroupInformation.getLoginUser()); PrivilegedEx

Adding headers to tuples before writing to S3

2017-10-23 Thread ShB
Hi, I'm working with Flink for data analytics and reporting. The use case is that, when a user requests a report, a Flink cluster does some computations on the data, generates the final report(a DataSet of tuples) and uploads the report to S3, post which an email is sent to the corresponding email

Re: Problems with taskmanagers in Mesos Cluster

2017-10-23 Thread Eron Wright
If I understand you correctly, the high-availability path isn't being changed but other TM-related settings are, and the recovered TMs aren't picking up the new configuration. I don't think that Flink supports on-the-fly reconfiguration of a Task Manager at this time. As a workaround, to achieve

Re: How to test new sink

2017-10-23 Thread Rinat
Timo, thx for your reply. I’m using gradle instead of maven, but I’ll look through the existing similar plugins for it. I don’t think, that sharing of external tests between other projects is a good idea, but it’s out of scope of current discussion. The main purpose of my request is to understa

Re: Sample project on real time ingestion of more than 1Billion events at a time

2017-10-23 Thread Timo Walther
Hi Deepak, actually, every Flink example program can scale up to millions of events and more. The Flink APIs are designed to abstract the business logic from the parallelism. You just need to implement the interfaces that Flink provides. If you are interesting in some example program, I can

Re: How to test new sink

2017-10-23 Thread Timo Walther
Hi Rinat, using one of the Flink test utilities is a good approach to test your custom operators. But of course these classes might change in the future. First of all, Flink is a open source project so you can just copy the required classes. However, it should be possible to use the Flink tes

Sample project on real time ingestion of more than 1Billion events at a time

2017-10-23 Thread Deepak Sharma
Hi All Can anyone point me to sample code that can scale upto real time ingestion of events in millions or billion ? Also i need reference to which analytic engine to store the transformed events for real time querying . Thanks a lot in advance. --Deepak

Re: Accumulator with Elasticsearch Sink

2017-10-23 Thread Sendoh
Hi Gordon, I commented the ticket. What would you think I start to implement it as there is no progress recently? I see it's only about adding metrics in the following class? and anyway I probably would have to build it soon. https://github.com/apache/flink/blob/master/flink-connectors/flink-con

Re: Checkpoint was declined (tasks not ready)

2017-10-23 Thread bartektartanus
Ok, looks like we've found the cause of this issue. The scenario looks like this: 1. The queue is full (let's assume that its capacity is N elements) 2. There is some pending element waiting, so the pendingStreamElementQueueEntry field in AsyncWaitOperator is not null and while-loop in addAsyncBuff

Reprocessing the data after config change

2017-10-23 Thread Tomasz Dobrzycki
Hi all, I'm currently working on a system that windows and extract metrics from data made of browser events. This data is processed based on config loaded from external application. One of the main requirements of the system is to reprocess historical data (within some reason, currently I've set o

How to test new sink

2017-10-23 Thread Rinat
Hi !!! I’ve just implemented a new sink, that extends functionality of existing BucketingSink, currently I’m trying to test functionality, that is related with timing. My sink implements ProcessingTimeCallback, similarly with the original BucketingSink. I’m trying to inject TestProcessingTimeSe

Re: Checkpoint was declined (tasks not ready)

2017-10-23 Thread Maciek Próchniak
it seems that one of operators is stuck during recovery: prio=5 os_prio=0 tid=0x7f634bb31000 nid=0xd5e in Object.wait() [0x7f63f13cc000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502

Re: Checkpoint was declined (tasks not ready)

2017-10-23 Thread Maciek Próchniak
we also have similar problem - it happens really often when we invoke async operators (ordered one). But we also observe that job is not starting properly - we don't process any data when such problems appear we'll keep you posted if we manage to find exact cause... thanks, maciek On 09/10/2

Incompatible types of expression and result type.

2017-10-23 Thread ??????
Dear All,I have a question about TableSource. I defined a TableSource By StreamTableSource,then register a table and execute a query.the sql as "select f0 from myTable". final,turn the result table to DataStream. The following error occurred in execution and how to solve? Exception in th

Re: ResultPartitionMetrics

2017-10-23 Thread Chesnay Schepler
The metrics registered in initializeBufferMetrics aggregate across all InputGates, whereas the metrics registered in the InputGateMetrics are separate for each InputGate. As an example, let's say a task has 2 input gates, with each having 2 input buffers queued: // IOMetricGroup#initializeBu

Re: HBase config settings go missing within Yarn.

2017-10-23 Thread Piotr Nowojski
Till do you have some idea what is going on? I do not see any meaningful difference between Niels code and HBaseWriteStreamExample.java. There is also a very similar issue on mailing list as well: “Flink can't read hdfs namenode logical url” Piotrek > On 22 Oct 2017, at 12:56, Niels Basjes w

Re: SLF4j logging system gets clobbered?

2017-10-23 Thread Piotr Nowojski
Till could you take a look at this? Piotrek > On 18 Oct 2017, at 20:32, Jared Stehler > wrote: > > I’m having an issue where I’ve got logging setup and functioning for my > flink-mesos deployment, and works fine up to a point (the same point every > time) where it seems to fall back to “defa

Re: How can I create a savepoint if I have Flink running in containers?

2017-10-23 Thread Timo Walther
Hi, I'm not a deployment expert but I think creating a savepoint should still be doable throught the CLI client. The Flink JobManager and TaskManager just run the containers and the CLI connects to a JobManager. I will loop in someone more familar with deployment. We should definitely improve

Re: ResultPartitionMetrics

2017-10-23 Thread Timo Walther
Hi Aitozi, I will loop in people that are more familar with the network stack and metrics. Maybe this is a bug? Regards, Timo Am 10/22/17 um 4:36 PM schrieb aitozi: Hi, i see in version 1.3, it add the ResultPartitionMetrics with issue:https://issues.apache.org/jira/browse/FLINK-5090 but

Re: Flink 1.4 release timeline

2017-10-23 Thread Timo Walther
Hi Moiz, the community is working hard to fix the last blockers for the release. The feature freeze should happen end of this month. After that we will test the release over some weeks. I think you can expect the 1.4 release end of November. Feel free to help us with the release by testing the

user@flink.apache.org

2017-10-23 Thread Timo Walther
Hi Han, generally, Flink is a strongly typed system. I think the easiest way to handle a dynamic schema is to read your JSON as a String. You can then implement your own ScalarFunction (or in this case also a TableFunction) [1] and use any JSON parsing library in this function for preprocessin

Re: flink can't read hdfs namenode logical url

2017-10-23 Thread Piotr Nowojski
Hi, Why in this new message there is a different host? Previously code was trying to connect to “master:8020” and now it is “startdt”? If you were able to change this host somehow between runs, I guess you should be also able to set it to correct one. Piotrek > On 23 Oct 2017, at 09:11, 邓俊华

回复:flink can't read hdfs namenode logical url

2017-10-23 Thread 邓俊华
Hi, Thanks for your replay! I have been block in this for several days.And I have double checked that there are hdfs-site.xml,core-site.xml,yarn-site.xml  in YARN_CONF_DIR. But it is still can't read hdfs namenode logical url. 2017-10-23 14:35:17,750 DEBUG org.apache.flink.api.java.hadoop.mapred