Re: Iterations vs. combo source/sink

2016-09-30 Thread Ken Krugler
Hi Fabian, Thanks for responding. Comments and questions inline below. Regards, — Ken > On Sep 29, 2016, at 6:10am, Fabian Hueske wrote: > > Hi Ken, > > you can certainly have partitioned sources and sinks. You can control the > parallelism by calling .setParallelism()

Re: How can I prove ....

2016-09-30 Thread amir bahmanyari
Hi Stephan,This is from the dashboard. Total Parallelism is set = 1024.259 tasks per TM. all say Running, but I get *.out log in beam4 server only (bottom of the servers list).Does this mean that all nodes are engaged in processing the data?Why are these encircled columns having 0's for their

Re: Using Flink and Cassandra with Scala

2016-09-30 Thread Stephan Ewen
How hard would it be to add case class support? Internally, tuples and case classes are treated quite similar, so I think it could be a quite simple extension... On Fri, Sep 30, 2016 at 4:22 PM, Sanne de Roever wrote: > Thanks Chesnay. Have a good weekend. > > On

SVM classification problem.

2016-09-30 Thread Kürşat Kurt
Hi; I am trying to train and predict with the same set. I expect that accuracy shuld be %100, am i wrong? If i try to predict with the same set; it is failing, also it classifies like "-1" which is not in the training set. What is wrong with this code? Code: def main(args:

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Shannon Carey
Implementing a custom serialization approach with Flink's CopyableValue (instead of relying on Flink to automatically use Kryo) solved the issue. As a side benefit, this also reduced the serialized size of my object by about half. From: Stephan Ewen >

Re: Exceptions from collector.collect after cancelling job

2016-09-30 Thread Shannon Carey
My flat map function is catching & logging the exception. The try block happens to encompass the call to Collector#collect(). I will move the call to collect outside of the try. That should silence the log message. On 9/30/16, 3:51 AM, "Ufuk Celebi" wrote: >On Thu, Sep

Re: Controlling savepoints from inside an application

2016-09-30 Thread Astrac
Thanks for the answer, the changes in the FLIP are quite interesting, are they coming in 1.2? What I mean by "manually reading the savepoint" is that rather than providing the savepoint path via "the --fromSavepoint hdfs://some-path/to/savepoint" option I'd like to provide it in the code that

Re: Using Flink and Cassandra with Scala

2016-09-30 Thread Sanne de Roever
Thanks Chesnay. Have a good weekend. On Thu, Sep 29, 2016 at 5:03 PM, Chesnay Schepler wrote: > the cassandra sink only supports java tuples and POJO's. > > > On 29.09.2016 16:33, Sanne de Roever wrote: > >> Hi, >> >> Does the Cassandra sink support Scala and case classes?

Re: Controlling savepoints from inside an application

2016-09-30 Thread Ufuk Celebi
Hey Aldo, On Fri, Sep 30, 2016 at 3:17 PM, Astrac wrote: > * Configure the savepoint path while we build the StreamExecutionEnvironment > rather than in flink-conf.yml > * Manually read a savepoint rather than passing it via the CLI what you describe is not possible right

Controlling savepoints from inside an application

2016-09-30 Thread Astrac
In the project I am working on we are versioning all our flink operators in order to be able to re-build the state from external sources (i.e. Kafka) by bumping that version number; this works pretty nicely so far, except that we need to be aware of wether or not we need to load the savepoint

Re: Blobstorage Locally and on HDFS

2016-09-30 Thread Konstantin Knauf
Hi Ufuk, thanks for your quick answer. Setup: 2 Servers, each running a JM as well as TM 1) Removing all existing blobstores locally (/tmp) as well as on HDFS 2) Starting a flink streaming job Now there are the following BLOBs: Local: *Leader JM: 4.0K

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Stephan Ewen
@Shannon Concerning the issue with long checkpoints even though the snapshot is very short: I found a critical issue with the Flink Kafka 0.9 Consumer - on low-throughput topics/partitions, it can lock up for a while, preventing checkpoints to be triggered (barriers injected). There is a fix

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Fabian Hueske
This works with event-time as well. You need to set the right TimeCharacteristics on the exec env and assign timestamps + watermarks. The only time depended operation is the window. YourWindowFunction assigns the timestamp of the window. WindowFunction.apply() has a TimeWindow parameter that gives

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Simone Robutti
I'm working with your suggestions, thank you very much. What I'm missing here is what YourWindowFunction should do. I have no notion of event time there and so I can't assign a timestamp. Also this solution seems to be working by processing time, while I care about event time. I couldn't make it

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Stephan Ewen
Agree with Stefan, let's see if the fully async snapshot mode helps. It looks suspiciously RocksDB related... On Fri, Sep 30, 2016 at 10:30 AM, Stefan Richter < s.rich...@data-artisans.com> wrote: > Hi Shannon, > > from your new stack trace and the bogus class names, I agree with Stephan > that

Re: Exceptions from collector.collect after cancelling job

2016-09-30 Thread Ufuk Celebi
On Thu, Sep 29, 2016 at 9:29 PM, Shannon Carey wrote: > It looks like Flink is disabling the objects that the FlatMap collector > relies on before disabling the operator itself. Is that expected/normal? Is > there anything I should change in my FlatMap function or job code to

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Stefan Richter
Hi Shannon, from your new stack trace and the bogus class names, I agree with Stephan that either serialization or the database itself is corrupted in some way. Could you please check if this problem only happens if checkpointing is enabled? If yes, does switching to fully async snapshots

Re: Blobstorage Locally and on HDFS

2016-09-30 Thread Ufuk Celebi
On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf wrote: > we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS > as checkpoint and recovery storage dir. What we see is that blobStores > are stored in HDFS as well as under the local Jobmanagers and

回复:How to Broadcast a very large model object (used in iterative scoring in recommendation system) in Flink

2016-09-30 Thread rimin515
your message is very short,i can not read more.the follow is my guss, in flink,the dataStream is not for iterative computation,the dataSet would be more well.and fink suggest broadcast mini data,not large. your can load your model data (it can be from file,or table),before main

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Fabian Hueske
Hi Simone, I think I have a solution for your problem: val s: DataStream[(Long, Int, ts)] = ??? // (id, state, time) val stateChanges: DataStream[(Int, Int)] = s // (state, cntUpdate) .keyBy(_._1) // key by id .flatMap(new StateUpdater) // StateUpdater is a stateful FlatMapFunction. It has

How to Broadcast a very large model object (used in iterative scoring in recommendation system) in Flink

2016-09-30 Thread Anchit Jatana
Hi All, I'm building a recommendation system streaming application for which I need to broadcast a very large model object (used in iterative scoring) among all the task managers performing the operation parallely for the operator I'm doing an this operation in map1 of CoMapFunction. Please