Re: Issues testing Flink HA w/ ZooKeeper

2016-02-16 Thread Stefano Baghino
Ok, simply turning up HDFS on the cluster and using it as the state backend fixed the issue. Thank you both for the help! On Mon, Feb 15, 2016 at 5:45 PM, Stefano Baghino < stefano.bagh...@radicalbit.io> wrote: > You can find the log of the recovering job manager here: > https://gist.github.com/s

Re: Read once input data?

2016-02-16 Thread Flavio Pompermaier
I also have a couple of use cases where the pin data sets in memory feature would help a lot ;) On Mon, Feb 15, 2016 at 10:18 PM, Saliya Ekanayake wrote: > Thanks, I'll check this. > > Saliya > > On Mon, Feb 15, 2016 at 4:08 PM, Fabian Hueske wrote: > >> I would have a look at the example progr

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-16 Thread Stephan Ewen
Hi! As a bit of background: ZooKeeper allows you only to store very small data. We hence persist only the changing checkpoint metadata in ZooKeeper. To recover a job, some constant data is also needed: The JobGraph, and the JarFiles. These cannot go to ZooKeeper, but need to go to a reliable stor

Re: Regarding Concurrent Modification Exception

2016-02-16 Thread Biplob Biswas
Hi, No, we don't start a flink job inside another job, although the job creation was done in a loop, but only when one job is finished the next job started after cleanup. And we didn't get this exception on my local flink installation, it appears when i run on the cluster. Thanks & Regards Biplob

Availability for the ElasticSearch 2 streaming connector

2016-02-16 Thread Vieru, Mihail
Hi, in reference to this ticket https://issues.apache.org/jira/browse/FLINK-3115 when do you think that an ElasticSearch 2 streaming connector will become available? Will it make it for the 1.0 release? That would be great, as we are planning to use that particular version of ElasticSearch in the

Re: Flink 1.0.0 Release Candidate 0: Please help testing

2016-02-16 Thread Stephan Ewen
Found one blocker issue during testing: - Watermark generators accept negative watermarks (FLINK-3415) On Mon, Feb 15, 2016 at 8:47 PM, Robert Metzger wrote: > Hi, > > I've now created a "preview RC" for the upcoming 1.0.0 release. > There are still some blocking issues and important pull req

Re: inconsistent results

2016-02-16 Thread Aljoscha Krettek
Hi, I don’t know about the results but one problem I can identify is this snipped: groupBy(0).sum(2).max(2) The max(2) here is a non-parallel operation since it finds the max over all elements, not grouped by key. If you want the max to also be per-key you have to use groupBy(0).sum(2).andMax(

Re: Read once input data?

2016-02-16 Thread Saliya Ekanayake
Fabian, I've a quick follow-up question on what you suggested. When streaming the same data through different maps, were you implying that everything goes as single job in Flink, so data read happens only once? Thanks, Saliya On Mon, Feb 15, 2016 at 3:58 PM, Fabian Hueske wrote: > It is not po

Re: Read once input data?

2016-02-16 Thread Fabian Hueske
Yes, if you implement both maps in a single job, data is read once. 2016-02-16 15:53 GMT+01:00 Saliya Ekanayake : > Fabian, > > I've a quick follow-up question on what you suggested. When streaming the > same data through different maps, were you implying that everything goes as > single job in F

Re: writeAsCSV with partitionBy

2016-02-16 Thread Srikanth
Fabian, Not sure if we are on the same page. If I do something like below code, it will groupby field 0 and each task will write a separate part file in parallel. val sink = data1.join(data2) .where(1).equalTo(0) { ((l,r) => ( l._3, r._3) ) } .partitionByHash(0) .writeAsCsv(pathBa

Re: writeAsCSV with partitionBy

2016-02-16 Thread Fabian Hueske
Yes, you're right. I did not understand your question correctly. Right now, Flink does not feature an output format that writes records to output files depending on a key attribute. You would need to implement such an output format yourself and append it as follows: val data = ... data.partitionB

Re: Read once input data?

2016-02-16 Thread Saliya Ekanayake
I looked at the samples and I think what you meant is clear, but I didn't find a solution for my need. In my case, I want to use the result from first map operation before I can apply the second map on the *same* data set. For simplicity, let's say I've a bunch of short values represented as my dat

Re: Read once input data?

2016-02-16 Thread Fabian Hueske
You can use so-called BroadcastSets to send any sufficiently small DataSet (such as a computed average) to any other function and use it there. However, in your case you'll end up with a data flow that branches (at the source) and merges again (when the average is send to the second map). Such patt

Re: Read once input data?

2016-02-16 Thread Saliya Ekanayake
Thank you, yes, this makes sense. The broadcasted data in my case would a large array of 3D coordinates, On a side note, how can I take the output from a reduce function? I can see methods to write it to a given output, but is it possible to retrieve the reduced result back to the program - like a

Re: Read once input data?

2016-02-16 Thread Fabian Hueske
Broadcasted DataSets are stored on the JVM heap of each task manager (but shared among multiple slots on the same TM), hence the size restriction. There are two ways to retrieve a DataSet (such as the result of a reduce). 1) if you want to fetch the result into your client program use DataSet.coll

Re: Read once input data?

2016-02-16 Thread Saliya Ekanayake
Thank you. I'll check this On Tue, Feb 16, 2016 at 4:01 PM, Fabian Hueske wrote: > Broadcasted DataSets are stored on the JVM heap of each task manager (but > shared among multiple slots on the same TM), hence the size restriction. > > There are two ways to retrieve a DataSet (such as the result

streaming hdfs sub folders

2016-02-16 Thread Martin Neumann
Hi, I have a streaming machine learning job that usually runs with input from kafka. To tweak the models I need to run on some old data from HDFS. Unfortunately the data on HDFS is spread out over several subfolders. Basically I have a datum with one subfolder for each hour within those are the a

where can get the summary changes between flink-1.0 and flink-0.10

2016-02-16 Thread wangzhijiang999
Hi,    Where can get the summary changes between flink-1.0 and flink-0.10,   thank you in advance!   Best Regards, Zhijiang Wang

Re: where can get the summary changes between flink-1.0 and flink-0.10

2016-02-16 Thread Chiwan Park
Hi Zhijiang, We have wiki pages about description of Flink 1.0 relesase [1] [2]. But the pages are not updated in realtime. It is possible that there are some changes that haven’t been described. After releasing 1.0 officially, maybe we post an article dealing with the changes in 1.0 to the Fl