Implicit inference of TypeInformation for join keys

2016-03-29 Thread Timur Fayruzov
Hello, Another issue I have encountered is incorrect implicit resolution (I'm using Scala 2.11.7). Here's the example (with a workaround): val a = env.fromCollection(Seq(Thing("a", "b"), Thing("c", "d"))) val b = env.fromCollection(Seq(Thing("a", "x"), Thing("z", "m"))) a.coGroup(b) .where(e

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Suneel Marthi
U may want to use FlinkMLTools.persist() methods which use TypeSerializerFormat and don't enforce IOReadableWritable. On Tue, Mar 29, 2016 at 2:12 PM, Sourigna Phetsarath < gna.phetsar...@teamaol.com> wrote: > Till, > > Thank you for your reply. > > Having this issue though, WeightVector does

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Sourigna Phetsarath
Till, Thank you for your reply. Having this issue though, WeightVector does not extend IOReadWriteable: *public* *class* SerializedOutputFormat<*T* *extends* IOReadableWritable> *case* *class* WeightVector(weights: Vector, intercept: Double) *extends* Serializable {} However, I will use the

Re: Why Scala Option is not a valid key?

2016-03-29 Thread Timur Fayruzov
There is some more detail to this question that I missed initially. It turns out that my key is a case class of a form MyKey(k1: Option[String], k2: Option[String]). Keys.SelectorFunctionKeys is performing a recursive check whether every element of the MyKey class can be a key and fails when

withBroadcastSet for a DataStream missing?

2016-03-29 Thread Stavros Kontopoulos
H i am new here... I am trying to implement online k-means as here https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html with flink. I dont see anywhere a withBroadcastSet call to save intermediate results is this currently supported? Is intermediate results

Why Scala Option is not a valid key?

2016-03-29 Thread Timur Fayruzov
Hello, I'm evaluating Flink and one thing I noticed is Option[A] can't be used as a key for coGroup (looking specifically here: https://github.com/apache/flink/blob/master/flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/OptionTypeInfo.scala#L39). I'm not clear about the reason of

Re: Read a given list of HDFS folder

2016-03-29 Thread Maximilian Michels
Hi Gwenhael, That is not possible right now. As a workaround, you could have three DataSets that are constructed by reading recursively from each directory and unify these later. Alternatively, moving/linking the directories in a different location would also work. I agree that it would be nice

Re: jackson DefaultScalaModule missing on deployment

2016-03-29 Thread Bart van Deenen
Thanks Copying jars into the lib directory works fine On Tue, Mar 29, 2016, at 13:34, Balaji Rajagopalan wrote: > You will have to include dependent jackson jar in flink server lib > folder, or create a fat jar. > > balaji > > On Tue, Mar 29, 2016 at 4:47 PM, Bart van Deenen >

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
Perhaps there is a misunderstanding on my side over the parallelism and split management given a data source. We started from the current JDBCInputFormat to make it multi-thread. Then, given a space of keys, we create the splits based on a fetchsize set as a parameter. In the open, we get a

Re: Upserts with Flink-elasticsearch

2016-03-29 Thread HungChang
Hi Zach, For using upsert in ES2, I guess it looks like as follows? However I cannot find which method in Request returns UpdateRequest while Requests.indexRequest() returns IndexRequest. Can I ask did you know it? public static UpdateRequest updateIndexRequest(String element) {

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
That is exactly my point. I should have 32 threads running, but I have only 8. 32 Task are created, but only only 8 are run concurrently. Flavio and I will try to make a simple program to produce the problem. If we solve our issues on the way, we'll let you know. thanks a lot anyway. saluti,

jackson DefaultScalaModule missing on deployment

2016-03-29 Thread Bart van Deenen
Hi all I've succesfully built a Flink streaming job, and it runs beautifully in my IntelliJ ide, with a Flink instance started on the fly. The job eats Kafka events, and outputs to file. All the i/o is json encoded with Jackson. But I'm having trouble with deploying the jar on a Flink server

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
In fact, I don't use it. I just had to crawl back the runtime implementation to get to the point where parallelism was switching from 32 to 8. saluti, Stefano 2016-03-29 12:24 GMT+02:00 Till Rohrmann : > Hi, > > for what do you use the ExecutionContext? That should

Re: threads, parallelism and task managers

2016-03-29 Thread Till Rohrmann
Hi, for what do you use the ExecutionContext? That should actually be something which you shouldn’t be concerned with since it is only used internally by the runtime. Cheers, Till ​ On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli wrote: > Well, in theory yes. Each task

Re: threads, parallelism and task managers

2016-03-29 Thread Stefano Bortoli
Well, in theory yes. Each task has a thread, but only a number is run in parallel (the job of the scheduler). Parallelism is set in the environment. However, whereas the parallelism parameter is set and read correctly, when it comes to actual starting of the threads, the number is fix to 8. We

Re: threads, parallelism and task managers

2016-03-29 Thread Ufuk Celebi
Hey Stefano, this should work by setting the parallelism on the environment, e.g. env.setParallelism(32) Is this what you are doing? The task threads are not part of a pool, but each submitted task creates its own Thread. – Ufuk On Fri, Mar 25, 2016 at 9:10 PM, Flavio Pompermaier

Re: DataSet.randomSplit()

2016-03-29 Thread Till Rohrmann
Hi, I think Ufuk is completely right. As far as I know, we don't support this function and nobody's currently working on it. If you like, then you could take the lead there. Cheers, Till On Mon, Mar 28, 2016 at 10:50 PM, Ufuk Celebi wrote: > Hey Gna! I think that it's not on

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Till Rohrmann
Hi Gna, there are no utilities yet to do that but you can do it manually. In the end, a model is simply a Flink DataSet which you can serialize to some file. Upon reading this DataSet you simply have to give it to your algorithm to be used as the model. The following code snippet illustrates this

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Simone Robutti
To my knowledge there is nothing like that. PMML is not supported in any form and there's no custom saving format yet. If you really need a quick and dirty solution, it's not that hard to serialize the model into a file. 2016-03-28 17:59 GMT+02:00 Sourigna Phetsarath :

Re: window limits ?

2016-03-29 Thread Aljoscha Krettek
Hi, which version of Flink are you using and do you have a custom timestamp extractor/watermark extractor? The semantics of this changed between 0.10 and 1.0 and I just want to make sure that you get the correct behavior. Cheers, Aljoscha On Tue, 29 Mar 2016 at 10:13 Bart van Deenen

window limits ?

2016-03-29 Thread Bart van Deenen
Hi all I'm doing a fold on a sliding window, using TimeCharacteristic.EventTime. For output I'm picking the timestamp of the most recent event in the window, and use that to name the output (to a file). My question is: will a second run of Flink on the same set of data (from Kafka) put the same