date:20160930

Blobstorage Locally and on HDFS

2016-09-30 Thread Konstantin Knauf

Hi, we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS as checkpoint and recovery storage dir. What we see is that blobStores are stored in HDFS as well as under the local Jobmanagers and Taskmanagers /tmp directory. Is this the expected behaviour? Is there any documentation

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Fabian Hueske

Hi Simone, I think I have a solution for your problem: val s: DataStream[(Long, Int, ts)] = ??? // (id, state, time) val stateChanges: DataStream[(Int, Int)] = s // (state, cntUpdate) .keyBy(_._1) // key by id .flatMap(new StateUpdater) // StateUpdater is a stateful FlatMapFunction. It has a

回复：How to Broadcast a very large model object (used in iterative scoring in recommendation system) in Flink

2016-09-30 Thread rimin515

your message is very short,i can not read more.the follow is my guss, in flink,the dataStream is not for iterative computation,the dataSet would be more well.and fink suggest broadcast mini data,not large. your can load your model data (it can be from file,or table),before main function,a

Re: Blobstorage Locally and on HDFS

2016-09-30 Thread Ufuk Celebi

On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf wrote: > we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS > as checkpoint and recovery storage dir. What we see is that blobStores > are stored in HDFS as well as under the local Jobmanagers and > Taskmanagers /tmp directory.

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Stefan Richter

Hi Shannon, from your new stack trace and the bogus class names, I agree with Stephan that either serialization or the database itself is corrupted in some way. Could you please check if this problem only happens if checkpointing is enabled? If yes, does switching to fully async snapshots chang

Re: Exceptions from collector.collect after cancelling job

2016-09-30 Thread Ufuk Celebi

On Thu, Sep 29, 2016 at 9:29 PM, Shannon Carey wrote: > It looks like Flink is disabling the objects that the FlatMap collector > relies on before disabling the operator itself. Is that expected/normal? Is > there anything I should change in my FlatMap function or job code to account > for it? He

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Stephan Ewen

Agree with Stefan, let's see if the fully async snapshot mode helps. It looks suspiciously RocksDB related... On Fri, Sep 30, 2016 at 10:30 AM, Stefan Richter < s.rich...@data-artisans.com> wrote: > Hi Shannon, > > from your new stack trace and the bogus class names, I agree with Stephan > that e

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Simone Robutti

I'm working with your suggestions, thank you very much. What I'm missing here is what YourWindowFunction should do. I have no notion of event time there and so I can't assign a timestamp. Also this solution seems to be working by processing time, while I care about event time. I couldn't make it ru

Re: Counting latest state of stateful entities in streaming

2016-09-30 Thread Fabian Hueske

This works with event-time as well. You need to set the right TimeCharacteristics on the exec env and assign timestamps + watermarks. The only time depended operation is the window. YourWindowFunction assigns the timestamp of the window. WindowFunction.apply() has a TimeWindow parameter that gives

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Stephan Ewen

@Shannon Concerning the issue with long checkpoints even though the snapshot is very short: I found a critical issue with the Flink Kafka 0.9 Consumer - on low-throughput topics/partitions, it can lock up for a while, preventing checkpoints to be triggered (barriers injected). There is a fix goin

Re: Blobstorage Locally and on HDFS

2016-09-30 Thread Konstantin Knauf

Hi Ufuk, thanks for your quick answer. Setup: 2 Servers, each running a JM as well as TM 1) Removing all existing blobstores locally (/tmp) as well as on HDFS 2) Starting a flink streaming job Now there are the following BLOBs: Local: *Leader JM: 4.0K/tmp/blobStore-563a8820-9617-4d89-97a

Controlling savepoints from inside an application

2016-09-30 Thread Astrac

In the project I am working on we are versioning all our flink operators in order to be able to re-build the state from external sources (i.e. Kafka) by bumping that version number; this works pretty nicely so far, except that we need to be aware of wether or not we need to load the savepoint befor

Re: Controlling savepoints from inside an application

2016-09-30 Thread Ufuk Celebi

Hey Aldo, On Fri, Sep 30, 2016 at 3:17 PM, Astrac wrote: > * Configure the savepoint path while we build the StreamExecutionEnvironment > rather than in flink-conf.yml > * Manually read a savepoint rather than passing it via the CLI what you describe is not possible right now, but I'm working on

Re: Using Flink and Cassandra with Scala

2016-09-30 Thread Sanne de Roever

Thanks Chesnay. Have a good weekend. On Thu, Sep 29, 2016 at 5:03 PM, Chesnay Schepler wrote: > the cassandra sink only supports java tuples and POJO's. > > > On 29.09.2016 16:33, Sanne de Roever wrote: > >> Hi, >> >> Does the Cassandra sink support Scala and case classes? It looks like >> using

Re: Controlling savepoints from inside an application

2016-09-30 Thread Astrac

Thanks for the answer, the changes in the FLIP are quite interesting, are they coming in 1.2? What I mean by "manually reading the savepoint" is that rather than providing the savepoint path via "the --fromSavepoint hdfs://some-path/to/savepoint" option I'd like to provide it in the code that ini

Re: Exceptions from collector.collect after cancelling job

2016-09-30 Thread Shannon Carey

My flat map function is catching & logging the exception. The try block happens to encompass the call to Collector#collect(). I will move the call to collect outside of the try. That should silence the log message. On 9/30/16, 3:51 AM, "Ufuk Celebi" wrote: >On Thu, Sep 29, 2016 at 9:29 PM,

Re: Error while adding data to RocksDB: No more bytes left

2016-09-30 Thread Shannon Carey

Implementing a custom serialization approach with Flink's CopyableValue (instead of relying on Flink to automatically use Kryo) solved the issue. As a side benefit, this also reduced the serialized size of my object by about half. From: Stephan Ewen mailto:se...@apache.org>> Date: Friday, Septe

SVM classification problem.

2016-09-30 Thread Kürşat Kurt

Hi; I am trying to train and predict with the same set. I expect that accuracy shuld be %100, am i wrong? If i try to predict with the same set; it is failing, also it classifies like "-1" which is not in the training set. What is wrong with this code? Code: def main(args: Array[String])

Re: Using Flink and Cassandra with Scala

2016-09-30 Thread Stephan Ewen

How hard would it be to add case class support? Internally, tuples and case classes are treated quite similar, so I think it could be a quite simple extension... On Fri, Sep 30, 2016 at 4:22 PM, Sanne de Roever wrote: > Thanks Chesnay. Have a good weekend. > > On Thu, Sep 29, 2016 at 5:03 PM, C

Re: How can I prove ....

2016-09-30 Thread amir bahmanyari

Hi Stephan,This is from the dashboard. Total Parallelism is set = 1024.259 tasks per TM. all say Running, but I get *.out log in beam4 server only (bottom of the servers list).Does this mean that all nodes are engaged in processing the data?Why are these encircled columns having 0's for their da

Re: Iterations vs. combo source/sink

2016-09-30 Thread Ken Krugler

Hi Fabian, Thanks for responding. Comments and questions inline below. Regards, — Ken > On Sep 29, 2016, at 6:10am, Fabian Hueske wrote: > > Hi Ken, > > you can certainly have partitioned sources and sinks. You can control the > parallelism by calling .setParallelism() method. So I assume

Blobstorage Locally and on HDFS

Re: Counting latest state of stateful entities in streaming

回复：How to Broadcast a very large model object (used in iterative scoring in recommendation system) in Flink

Re: Blobstorage Locally and on HDFS

Re: Error while adding data to RocksDB: No more bytes left

Re: Exceptions from collector.collect after cancelling job

Re: Error while adding data to RocksDB: No more bytes left

Re: Counting latest state of stateful entities in streaming

Re: Counting latest state of stateful entities in streaming

Re: Error while adding data to RocksDB: No more bytes left

Re: Blobstorage Locally and on HDFS

Controlling savepoints from inside an application

Re: Controlling savepoints from inside an application

Re: Using Flink and Cassandra with Scala

Re: Controlling savepoints from inside an application

Re: Exceptions from collector.collect after cancelling job

Re: Error while adding data to RocksDB: No more bytes left

SVM classification problem.

Re: Using Flink and Cassandra with Scala

Re: How can I prove ....

Re: Iterations vs. combo source/sink

21 matches

Site Navigation

Mail list logo

Footer information