Re: Performance tuning

2017-02-24 Thread Dmitry Golubets
l just serialize an integer id. So the > amount of data being transferred goes down drastically. > > The disableAutoTypeRegistration flag is ignored in the DataStream API at > the moment. > > > > > > > > On Thu, Feb 23, 2017 at 7:00 PM, Dmitry Golubets <dgolub...@gmai

Re: Performance tuning

2017-02-23 Thread Dmitry Golubets
case classes? If so, what are > you using for doing that? > > Regards, > Robert > > On Fri, Feb 17, 2017 at 9:17 PM, Dmitry Golubets <dgolub...@gmail.com> > wrote: > >> Hi Daniel, >> >> I've implemented a macro that generates message pack serializ

Re: Performance tuning

2017-02-17 Thread Dmitry Golubets
DefaultKryoSerializer(..) > . > > I'm interested on knowing what have you done there for a boost of about > 50% . > > Some small or simple example would be very nice. > > Thank you very much in advance. > > Kind Regards, > > Daniel Santos > > On 02/17/201

Re: How important is 'registerType'?

2017-02-17 Thread Dmitry Golubets
th for you. > > Cheers, > Till > ​ > > On Fri, Feb 17, 2017 at 12:38 PM, Dmitry Golubets <dgolub...@gmail.com> > wrote: > >> Hi, >> >> I was using ```cs.knownDirectSubclasses``` recursively to find and >> register subclasses, which may have resulted

Akka 2.4

2017-02-16 Thread Dmitry Golubets
Hi, Can I force Flink to use Akka 2.4 (recompile if needed)? Is it going to misbehave in a subtle way? Best regards, Dmitry

Re: A way to control redistribution of operator state?

2017-02-14 Thread Dmitry Golubets
w it is just hard coded to use a round-robin repartitioner > implementation as default. > > However, I’m not sure of the plans in exposing this to the user and making > it configurable. > Looping in Stefan (in cc) who mostly worked on this part and see if he can > provide more info. &g

A way to control redistribution of operator state?

2017-02-13 Thread Dmitry Golubets
Hi, It looks impossible to implement a keyed state with operator state now. I know it sounds like "just use a keyed state", but latter requires updating it on every value change as opposed to operator state and thus can be expensive (especially if you have to deal with mutable structures inside

How important is 'registerType'?

2017-02-10 Thread Dmitry Golubets
The docs say that it may improve performance. How true is it, when custom serializers are provided? There is also 'disableAutoTypeRegistration' method in the config class, implying Flink registers types automatically. So, given that I have an hierarchy: trait A class B extends A class C extends

Where to put "pre-start" logic and how to detect recovery?

2017-02-09 Thread Dmitry Golubets
Hi, I need to re-create a Kafka topic when a job is started in "clean" mode. I can do it, but I'm not sure if I do it in the right place. Is it fine to put this kind of code in the "main"? Then it's called on every job submit. But.. how to detect if a job is being started from a savepoint? Or

Re: logback

2017-02-08 Thread Dmitry Golubets
Update: I've now used 1.1.3 versions as in the example in the docs and it works! Looks like these is an incompatibility with the latest logback. Best regards, Dmitry On Wed, Feb 8, 2017 at 3:20 PM, Dmitry Golubets <dgolub...@gmail.com> wrote: > Hi Robert, > > After reading that

Re: logback

2017-02-08 Thread Dmitry Golubets
i.apache.org/ > projects/flink/flink-docs-release-1.2/monitoring/best_ > practices.html#use-logback-when-running-flink-on-a-cluster > > On Tue, Feb 7, 2017 at 1:07 PM, Dmitry Golubets <dgolub...@gmail.com> > wrote: > >> Hi, >> >> documentation says: "Users willin

logback

2017-02-07 Thread Dmitry Golubets
Hi, documentation says: "Users willing to use logback instead of log4j can just exclude log4j (or delete it from the lib/ folder)." But then Flink just doesn't start. I added logback-classic 1.10 to it's lib folder, but still get NoClassDefFoundError: ch/qos/logback/core/joran/spi/JoranException

Re: Parallelism and max-parallelism

2017-02-06 Thread Dmitry Golubets
thing is missing, feel free to report it here. > > The PRs will be merged later today. > > > On Mon, Feb 6, 2017 at 4:41 PM, Dmitry Golubets <dgolub...@gmail.com> > wrote: > > Hi guys, > > > > I would appreciate if someone could explain to me what's the dif

Parallelism and max-parallelism

2017-02-06 Thread Dmitry Golubets
Hi guys, I would appreciate if someone could explain to me what's the difference between those two. The current description refers to "dynamic scaling", and yet I can't find anything about it in Flink's docs. Best regards, Dmitry

Re: User configuration

2017-01-26 Thread Dmitry Golubets
and-line-arguments-and- > passing-them-around-in-your-flink-application > > On Thu, Jan 26, 2017 at 5:38 PM, Dmitry Golubets <dgolub...@gmail.com> > wrote: > >> Hi, >> >> Is there a place for user defined configuration settings? >> How to read them? >> >> Best regards, >> Dmitry >> > >

User configuration

2017-01-26 Thread Dmitry Golubets
Hi, Is there a place for user defined configuration settings? How to read them? Best regards, Dmitry

Re: Flink dependencies shading

2017-01-26 Thread Dmitry Golubets
dependency coming from? Maybe you can resolve the > issue on your side for now. > I've filed a JIRA for this issue: https://issues.apache. > org/jira/browse/FLINK-5661 > > > > On Wed, Jan 25, 2017 at 8:24 PM, Dmitry Golubets <dgolub...@gmail.com> > wrote: > &

Flink dependencies shading

2017-01-25 Thread Dmitry Golubets
I've build latest Flink from sources and it seems that httpclient dependency from flink-mesos is not shaded. It causes troubles with latest AWS SDK. Do I build it wrong or is it a known problem? Best regards, Dmitry

Why is IdentityObjectIntMap.get called so often?

2017-01-24 Thread Dmitry Golubets
Hi, I've just added my custom MsgPack serializers hoping to see performance increase. I covered all data types in between chains. However this Kryo method still takes a lot of CPU: IdentityObjectIntMap.get Is there something else should be configured? Or is there no way to get away from Kryo

Count window on partition

2017-01-23 Thread Dmitry Golubets
Hi, I'm looking for the right way to do the following scheme: 1. Read data 2. Split it into partitions for parallel processing 3. In every partition group data in N elements batches 4. Process these batches My first attempt was: *dataStream.keyBy(_.key).countWindow(..)* But countWindow groups

Re: Three input stream operator and back pressure

2017-01-17 Thread Dmitry Golubets
for that. > Overall, that seemed the more scalable design to us. > Can your use case follow a similar approach? > > Stephan > > > > On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets <dgolub...@gmail.com> > wrote: > >> Hi Timo, >> >> I don't have an

Re: Three input stream operator and back pressure

2017-01-17 Thread Dmitry Golubets
you have to implement > your own operator. That depends on your use case though. > > You can maintain backpressure by using Flink's operator state. But did you > also thought about a Window Join instead? > > I hope that helps. > > Timo > > > > > Am 17/01/17 um 00

Three input stream operator and back pressure

2017-01-16 Thread Dmitry Golubets
Hi, there are only *two *interfaces defined at the moment: *OneInputStreamOperator* and *TwoInputStreamOperator.* Is there any way to define an operator with arbitrary number of inputs? My another concern is how to maintain *backpressure *in the operator? Let's say I read events from two Kafka

Re: Can serialization be disabled between chains?

2017-01-16 Thread Dmitry Golubets
e memory consumption if data is > serialized into a fixed number of buffers instead of being put on the JVM > heap. > > Best, Fabian > > 2017-01-16 14:21 GMT+01:00 Dmitry Golubets <dgolub...@gmail.com>: > >> Hi Ufuk, >> >> Do you know what's the reason f

Re: Can serialization be disabled between chains?

2017-01-16 Thread Dmitry Golubets
Hi Ufuk, Do you know what's the reason for serialization of data between different threads? Also, thanks for the link! Best regards, Dmitry On Mon, Jan 16, 2017 at 1:07 PM, Ufuk Celebi wrote: > Hey Dmitry, > > this is not possible if I'm understanding you correctly. > > A

Can serialization be disabled between chains?

2017-01-13 Thread Dmitry Golubets
Hi, Let's say we have multiple subtask chains and all of them are executing in the same task manager slot (i.e. in the same JVM). What's the point in serializing data between them? Can it be disabled? The reason I want keep different chains is that some subtasks should be executed in parallel to