window limits ?

2016-03-29 Thread Bart van Deenen
Hi all I'm doing a fold on a sliding window, using TimeCharacteristic.EventTime. For output I'm picking the timestamp of the most recent event in the window, and use that to name the output (to a file). My question is: will a second run of Flink on the same set of data (from Kafka) put the same e

Re: window limits ?

2016-03-29 Thread Matthias J. Sax
If you use event time, a second run will put the exact same tuples into the windows (event time implies, that the timestamp is encoded in the tuple itself, thus, it is independent of the wall-clock time). However, be aware that the order of tuples *within a window* might change! Thus, the timesta

Re: window limits ?

2016-03-29 Thread Bart van Deenen
Great! I'm actually taking the max of the timestamps, so I should be fine. Thanks Bart On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote: > If you use event time, a second run will put the exact same tuples into > the windows (event time implies, that the timestamp is encoded in the > tuple

Re: window limits ?

2016-03-29 Thread Aljoscha Krettek
Hi, which version of Flink are you using and do you have a custom timestamp extractor/watermark extractor? The semantics of this changed between 0.10 and 1.0 and I just want to make sure that you get the correct behavior. Cheers, Aljoscha On Tue, 29 Mar 2016 at 10:13 Bart van Deenen wrote: > Gr

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Simone Robutti
To my knowledge there is nothing like that. PMML is not supported in any form and there's no custom saving format yet. If you really need a quick and dirty solution, it's not that hard to serialize the model into a file. 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath : > Flinksters, > > Is there

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Till Rohrmann
Hi Gna, there are no utilities yet to do that but you can do it manually. In the end, a model is simply a Flink DataSet which you can serialize to some file. Upon reading this DataSet you simply have to give it to your algorithm to be used as the model. The following code snippet illustrates this

Re: Storm topologies compatibility and exactly-once

2016-03-29 Thread Maximilian Michels
Hi Olivier, Regarding the general question, please have a look at the documentation: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/storm_compatibility.html Yes, you may reuse your existing spouts/bolts, the Storm runtime is exchanged for the Flink runtime. Exactly-once: W

Re: DataSet.randomSplit()

2016-03-29 Thread Till Rohrmann
Hi, I think Ufuk is completely right. As far as I know, we don't support this function and nobody's currently working on it. If you like, then you could take the lead there. Cheers, Till On Mon, Mar 28, 2016 at 10:50 PM, Ufuk Celebi wrote: > Hey Gna! I think that it's not on the road map at th

Re: threads, parallelism and task managers

2016-03-29 Thread Ufuk Celebi
Hey Stefano, this should work by setting the parallelism on the environment, e.g. env.setParallelism(32) Is this what you are doing? The task threads are not part of a pool, but each submitted task creates its own Thread. – Ufuk On Fri, Mar 25, 2016 at 9:10 PM, Flavio Pompermaier wrote: > A

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
Well, in theory yes. Each task has a thread, but only a number is run in parallel (the job of the scheduler). Parallelism is set in the environment. However, whereas the parallelism parameter is set and read correctly, when it comes to actual starting of the threads, the number is fix to 8. We run

Re: threads, parallelism and task managers

2016-03-29 Thread Till Rohrmann
Hi, for what do you use the ExecutionContext? That should actually be something which you shouldn’t be concerned with since it is only used internally by the runtime. Cheers, Till ​ On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli wrote: > Well, in theory yes. Each task has a thread, but only

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
In fact, I don't use it. I just had to crawl back the runtime implementation to get to the point where parallelism was switching from 32 to 8. saluti, Stefano 2016-03-29 12:24 GMT+02:00 Till Rohrmann : > Hi, > > for what do you use the ExecutionContext? That should actually be > something which

Re: threads, parallelism and task managers

2016-03-29 Thread Till Rohrmann
Then it shouldn’t be a problem. The ExeuctionContetxt is used to run futures and their callbacks. But as Ufuk said, each task will spawn it’s own thread and if you set the parallelism to 32 then you should have 32 threads running. ​ On Tue, Mar 29, 2016 at 12:29 PM, Stefano Bortoli wrote: > In f

jackson DefaultScalaModule missing on deployment

2016-03-29 Thread Bart van Deenen
Hi all I've succesfully built a Flink streaming job, and it runs beautifully in my IntelliJ ide, with a Flink instance started on the fly. The job eats Kafka events, and outputs to file. All the i/o is json encoded with Jackson. But I'm having trouble with deploying the jar on a Flink server (ve

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
That is exactly my point. I should have 32 threads running, but I have only 8. 32 Task are created, but only only 8 are run concurrently. Flavio and I will try to make a simple program to produce the problem. If we solve our issues on the way, we'll let you know. thanks a lot anyway. saluti, Stef

Re: jackson DefaultScalaModule missing on deployment

2016-03-29 Thread Balaji Rajagopalan
You will have to include dependent jackson jar in flink server lib folder, or create a fat jar. balaji On Tue, Mar 29, 2016 at 4:47 PM, Bart van Deenen wrote: > Hi all > > I've succesfully built a Flink streaming job, and it runs beautifully in > my IntelliJ ide, with a Flink instance started o

Re: Upserts with Flink-elasticsearch

2016-03-29 Thread HungChang
Hi Zach, For using upsert in ES2, I guess it looks like as follows? However I cannot find which method in Request returns UpdateRequest while Requests.indexRequest() returns IndexRequest. Can I ask did you know it? public static UpdateRequest updateIndexRequest(String element) { Map json

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
Perhaps there is a misunderstanding on my side over the parallelism and split management given a data source. We started from the current JDBCInputFormat to make it multi-thread. Then, given a space of keys, we create the splits based on a fetchsize set as a parameter. In the open, we get a connec

Re: Upserts with Flink-elasticsearch

2016-03-29 Thread Zach Cox
You can just create a new UpdateRequest instance directly using its constructor [1] like this: return new UpdateRequest() .index(index) .type(type) .id(element) .source(json); [1] http://javadoc.kyubu.de/elasticsearch/HEAD/or

Re: jackson DefaultScalaModule missing on deployment

2016-03-29 Thread Bart van Deenen
Thanks Copying jars into the lib directory works fine On Tue, Mar 29, 2016, at 13:34, Balaji Rajagopalan wrote: > You will have to include dependent jackson jar in flink server lib > folder, or create a fat jar. > > balaji > > On Tue, Mar 29, 2016 at 4:47 PM, Bart van Deenen > wrote: >> Hi all

Re: Read a given list of HDFS folder

2016-03-29 Thread Maximilian Michels
Hi Gwenhael, That is not possible right now. As a workaround, you could have three DataSets that are constructed by reading recursively from each directory and unify these later. Alternatively, moving/linking the directories in a different location would also work. I agree that it would be nice t

Why Scala Option is not a valid key?

2016-03-29 Thread Timur Fayruzov
Hello, I'm evaluating Flink and one thing I noticed is Option[A] can't be used as a key for coGroup (looking specifically here: https://github.com/apache/flink/blob/master/flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/OptionTypeInfo.scala#L39). I'm not clear about the reason of t

withBroadcastSet for a DataStream missing?

2016-03-29 Thread Stavros Kontopoulos
H i am new here... I am trying to implement online k-means as here https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html with flink. I dont see anywhere a withBroadcastSet call to save intermediate results is this currently supported? Is intermediate results state

Re: Why Scala Option is not a valid key?

2016-03-29 Thread Timur Fayruzov
There is some more detail to this question that I missed initially. It turns out that my key is a case class of a form MyKey(k1: Option[String], k2: Option[String]). Keys.SelectorFunctionKeys is performing a recursive check whether every element of the MyKey class can be a key and fails when encoun

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Sourigna Phetsarath
Till, Thank you for your reply. Having this issue though, WeightVector does not extend IOReadWriteable: *public* *class* SerializedOutputFormat<*T* *extends* IOReadableWritable> *case* *class* WeightVector(weights: Vector, intercept: Double) *extends* Serializable {} However, I will use the a

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Suneel Marthi
U may want to use FlinkMLTools.persist() methods which use TypeSerializerFormat and don't enforce IOReadableWritable. On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath < gna.phetsar...@teamaol.com> wrote: > Till, > > Thank you for your reply. > > Having this issue though, WeightVector does n

Implicit inference of TypeInformation for join keys

2016-03-29 Thread Timur Fayruzov
Hello, Another issue I have encountered is incorrect implicit resolution (I'm using Scala 2.11.7). Here's the example (with a workaround): val a = env.fromCollection(Seq(Thing("a", "b"), Thing("c", "d"))) val b = env.fromCollection(Seq(Thing("a", "x"), Thing("z", "m"))) a.coGroup(b) .where(e =>

Re: Window Support in Flink

2016-03-29 Thread Nirmalya Sengupta
Hello Ufuk , The term 'hopping windows support' can be found in Slide 24 of the deck . In any case, You and Fabian have clarified. So, many thanks. -- Nirmalya -- Software Technologist http://www.linkedin.com/in/nirmalyasengupta "If you have b

DataSetUtils zipWithIndex question

2016-03-29 Thread Tarandeep Singh
Hi, I am looking at implementation of zipWithIndex in DataSetUtils- https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java It works in two phases/steps 1) Count number of elements in each partition (using mapPartition) 2) In second m