Re: Documentation not reachable

2015-06-29 Thread Robert Metzger
As you can see from status.apache.org, the host is currently down. I hope the fix the issue soon. On Mon, Jun 29, 2015 at 11:50 AM, Sachin Goel sachingoel0...@gmail.com wrote: I've been experiencing the same problem. Regards Sachin Goel On Mon, Jun 29, 2015 at 3:09 PM, Felix Neutatz

Re: [flink-ml] How to use ParameterMap in predict method?

2015-06-29 Thread Till Rohrmann
Hi Chiwan, at the moment the single element PredictOperation only supports non-distributed models. This means that it expects the model to be a single element DataSet which can be broadcasted to the predict mappers. If you need more flexibility, you can either extend the PredictOperation

[DISCUSS / VOTE] Signal name to kill streaming jobs

2015-06-29 Thread Matthias J. Sax
Hi, I am working on https://issues.apache.org/jira/browse/FLINK-2111 Stephan and I had a discussion about the name of the signal. See: https://github.com/apache/flink/pull/750 Because we cannot agree on either terminate or stop we would appreciate some feedback about it. If anybody has an third

Replacing Checkpointed interface with field annotations

2015-06-29 Thread Gyula Fóra
Hey all! Just to add something new to the end of the discussion list. After some discussion with Seif, and Paris, I have added a commit that replaces the use of the Checkpointed interface with field annotations. This is probably the most lightweight state declaration so far and it will probably

FYI: Flink documentation currently offline Fwd: [ NOTICE ] Services Unavailable

2015-06-29 Thread Robert Metzger
Hi, it seems that the Flink documentation is currently not available due to security issues with the ci.apache.org host. -- Forwarded message -- From: Tony Stevenson pct...@apache.org Date: Mon, Jun 29, 2015 at 11:57 AM Subject: [ NOTICE ] Services Unavailable To:

Re: Monitoring a Flink Job

2015-06-29 Thread Andra Lungu
Hey Fabian, I am aware of the way open, preSuperstep(), postSuperstep() etc can help me within an interation, unfortunately I am writing my own method here. I could try to briefly describe it: public static final class PropagateNeighborValues implements NeighborsFunctionWithVertexValue(...) {

Documentation not reachable

2015-06-29 Thread Felix Neutatz
Hi, when I want to access the documentation on the website I get the following error: http://ci.apache.org/projects/flink/flink-docs-release-0.9 Service Unavailable The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again

Re: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread Gyula Fóra
You are right, one cannot use the current window-join implementation to this. A workaround is to implement your custom binary stream operator that will wait until it receives the whole file, then starts joining. For instance a filestream.connect(streamToJoinWith).flatMap( CustomCoFlatMap that

Re: [flink-ml] How to use ParameterMap in predict method?

2015-06-29 Thread Chiwan Park
Thank you Till. I have another question. Can I use a DataSet object as Model? In KNN, we need to DataSet given in fit operation. But when I defined Model generic parameter to DataSet in PredictOperation, the getModel method’s return type is DataSet[DataSet]. I’m confused with this situation. If

[jira] [Created] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers

2015-06-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2291: Summary: Use ZooKeeper to elect JobManager leader and send information to TaskManagers Key: FLINK-2291 URL: https://issues.apache.org/jira/browse/FLINK-2291 Project:

Re: Monitoring a Flink Job

2015-06-29 Thread Fabian Hueske
Have you tried to use a custom accumulator that just appends to a list? 2015-06-29 12:59 GMT+02:00 Andra Lungu lungu.an...@gmail.com: Hey Fabian, I am aware of the way open, preSuperstep(), postSuperstep() etc can help me within an interation, unfortunately I am writing my own method here. I

Re: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread Matthias J. Sax
I am wondering what the semantics of a DataStream created from a file is. It should be a regular (but finite) stream. From my understanding, a Window-Join is defined with some ts-constraint. So the static file part will also have this restriction in the join, right? However, a file-stream-join

Re: Documentation not reachable

2015-06-29 Thread Sachin Goel
I've been experiencing the same problem. Regards Sachin Goel On Mon, Jun 29, 2015 at 3:09 PM, Felix Neutatz neut...@googlemail.com wrote: Hi, when I want to access the documentation on the website I get the following error: http://ci.apache.org/projects/flink/flink-docs-release-0.9

[jira] [Created] (FLINK-2290) CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive

2015-06-29 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2290: --- Summary: CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive Key: FLINK-2290 URL: https://issues.apache.org/jira/browse/FLINK-2290

Storm Compatibility Improvement

2015-06-29 Thread Matthias J. Sax
Hi, I started to work on a missing feature for the Storm compatibility layer: named attribute access In Storm, each attribute of an input tuple can be accessed via index or by name. Currently, only index access is supported. In order to support this feature in Flink (embedded Bolt in Flink

Re: Storm Compatibility Improvement

2015-06-29 Thread Gyula Fóra
Hey, I didn't look through the whole code so I probably don't get something but why don't you just do what storm does? Keep a map from the field names to indexes somewhere (make this accessible from the tuple) and then you can just use a simple Flink tuple. I think this is what's happening in

Re: Monitoring a Flink Job

2015-06-29 Thread Andra Lungu
Caution! I am getting philosophical. Stop me if I'm talking nonsense! You are suggesting a list that will have one or two entries per vertex = (approx) billions. Won't this over-saturate my memory? I am already filling it with lots of junk resulted from the computation... On Mon, Jun 29, 2015 at

[Runtime] Division by Zero Exception

2015-06-29 Thread Andra Lungu
From the same series of experiments: I am basically running an algorithm that simulates a Gather Sum Apply Iteration that performs Traingle Count (Why simulate it? Because you just need a superstep - useless overhead if you use the runGatherSumApply function in Graph). What happens, at a high

Re: Storm Compatibility Improvement

2015-06-29 Thread Matthias J. Sax
That would also work. I thought about it already, too. Thanks for the feedback. If two people have similar idea, it might be the right way to got. I will just include all this stuff and open an PR. Than we can evaluate it again. -Matthias On 06/30/2015 12:01 AM, Gyula Fóra wrote: By declare I

Re: Storm Compatibility Improvement

2015-06-29 Thread Gyula Fóra
By declare I mean we assume a Flink Tuple datatype and the user declares the name mapping (sorry its getting late). Gyula Fóra gyula.f...@gmail.com ezt írta (időpont: 2015. jún. 29., H, 23:57): Ah ok, now I get what I didn't get before :) So you want to take some input stream , and execute a

RE: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread 马国维
hi every one:thanks a lot for all you help. In my case , there is a data stream and a huge data set. Each element in the data stream wants to join the huge data set to produce a new data stream. But it can’t use the join method like the shuffle method or the broadcast method because of

Re: Monitoring a Flink Job

2015-06-29 Thread Vasiliki Kalavri
Andra, why don't you simply print to standard output and gather your metrics from the taskmanagers' log files after execution? Wouldn't that work for you? -V. On 29 June 2015 at 22:36, Andra Lungu lungu.an...@gmail.com wrote: Caution! I am getting philosophical. Stop me if I'm talking

Re: Storm Compatibility Improvement

2015-06-29 Thread Matthias J. Sax
Well. If a whole Storm topology is executed, this is of course the way to got. However, I want to have named-attribute access in the case of an embedded bolt (as a single operator) in a Flink program. And is this case, fields are not declared and do not have a name (eg, if the bolt's consumers

Re: FLINK-2066

2015-06-29 Thread Till Rohrmann
Done On Mon, Jun 29, 2015 at 9:33 AM, Chiwan Park chiwanp...@apache.org wrote: We should assign FLINK-2066 to Nuno. :) Regards, Chiwan Park On Jun 29, 2015, at 1:21 PM, Márton Balassi balassi.mar...@gmail.com wrote: Hey, Thanks for picking up the issue. This value can be

[jira] [Created] (FLINK-2289) Make JobManager highly available

2015-06-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2289: Summary: Make JobManager highly available Key: FLINK-2289 URL: https://issues.apache.org/jira/browse/FLINK-2289 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-2286) Window ParallelMerge sometimes swallows elements of the last window

2015-06-29 Thread JIRA
Márton Balassi created FLINK-2286: - Summary: Window ParallelMerge sometimes swallows elements of the last window Key: FLINK-2286 URL: https://issues.apache.org/jira/browse/FLINK-2286 Project: Flink

Re: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread Stephan Ewen
If you only want to join a finite data set (like a file) to a stream, you can do that. you can create a DataStream from a (distributed) file. If you want specific batch-api operations, this is still on the roadmap, not in yet, as Marton said. On Sun, Jun 28, 2015 at 10:45 AM, Márton Balassi

[jira] [Created] (FLINK-2285) Active policy emits elements of the last window twice

2015-06-29 Thread JIRA
Márton Balassi created FLINK-2285: - Summary: Active policy emits elements of the last window twice Key: FLINK-2285 URL: https://issues.apache.org/jira/browse/FLINK-2285 Project: Flink Issue

Re: FLINK-2066

2015-06-29 Thread Nuno Santos
Thank you all for the help, it is appreciated! 2015-06-29 9:12 GMT+01:00 Stephan Ewen se...@apache.org: Hi Nuno! Ultimately, the delay is a property of the ExecutionGraph. The ExecutionGraph is the data structure on the master (JobManager) that tracks the distributed execution. The

[jira] [Created] (FLINK-2284) Confusing/inconsistent PartitioningStrategy

2015-06-29 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2284: --- Summary: Confusing/inconsistent PartitioningStrategy Key: FLINK-2284 URL: https://issues.apache.org/jira/browse/FLINK-2284 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-2287) Implement JobManager high availability

2015-06-29 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2287: -- Summary: Implement JobManager high availability Key: FLINK-2287 URL: https://issues.apache.org/jira/browse/FLINK-2287 Project: Flink Issue Type: Improvement

Off-by-one issues in the windowing code

2015-06-29 Thread Márton Balassi
I have found two off-by-one issues in the windowing code. The first may result in duplicate data in the last window and is easy to fix. [1] The second may result data being swallowed in the last window, and is also not difficult to fix. [2] I've talked to Aljoscha about fixing the second one,

Re: Monitoring a Flink Job

2015-06-29 Thread Fabian Hueske
You can measure the time of each iteration in the open() methods operators within an iteration. open() will be called before each iteration. The times can be collected by either printing to std out (you need to collect the files then...) or by implementing a list accumulator. Each time should

Re: Thoughts About Streaming

2015-06-29 Thread Stephan Ewen
@Matthias: I think using the KeyedDataStream will simply result in smaller programs. May be hard for some users to make the connection to a 1-element-tumbling-window, simply because they want to use state. Not everyone is a deep into that stuff as you are ;-) On Sun, Jun 28, 2015 at 1:13 AM,

Re: Monitoring a Flink Job

2015-06-29 Thread Flavio Pompermaier
Why don't you use Flink dataset output functions (like writeAsText, writeAsCsv, etc..)? Or if they are not sufficient you can implement/override your own InputFormat. From what is my experience static variables are evil in distributed environments.. Moreover, one of the main strengths of Flink

Re: FLINK-2066

2015-06-29 Thread Chiwan Park
We should assign FLINK-2066 to Nuno. :) Regards, Chiwan Park On Jun 29, 2015, at 1:21 PM, Márton Balassi balassi.mar...@gmail.com wrote: Hey, Thanks for picking up the issue. This value can be specified as execution-retries.delay in the flink-conf.yaml. Hence you can check the