Re: Recursive directory traversal with TextInputFormat

2016-12-07 Thread Lukas Kircher
Hi Stefan, thanks for your answer. > I think there is a field in FileInputFormat (which TextInputFormat is > subclassing) that could serve your purpose if you override the default: That was my first guess as well. I use the basic setup from org.apache.flink.api.java.io.TextInputFormatTest.java

Re: Serializers and Schemas

2016-12-07 Thread Tzu-Li (Gordon) Tai
Hi Matt, 1. There’s some in-progress work on wrapper util classes for Kafka de/serializers here [1] that allows Kafka de/serializers to be used with the Flink Kafka Consumers/Producers with minimal user overhead. The PR also has some proposed adds to the documentations for the wrappers. 2. I fe

Re: Partitioning operator state

2016-12-07 Thread Tzu-Li (Gordon) Tai
Hi Dominik, Do you mean how Flink redistributes an operator’s state when the parallelism of the operator is changed? If so, you can take a look at [1] and [2]. Cheers, Gordon [1] https://issues.apache.org/jira/browse/FLINK-3755 [2]  https://docs.google.com/document/d/1G1OS1z3xEBOrYD4wSu-LuBCyPU

Re: Multiple Window Operators

2016-12-07 Thread Aljoscha Krettek
Hi, the problem is that the elements emitted from the count window operation all have a timestamp of Long.MAX_VALUE. The reason for this is that "countWindow(int, int)" de-sugars to this input .keyBy(...) .window(GlobalWindows.create()) .trigger(CountTrigger.of(1)) .evictor(CountEvictor.of

Re: onEventTime() is not called after setting ctx.registerEventTimeTimer(timstamp)

2016-12-07 Thread Aljoscha Krettek
Hi, onEventTime() will be called when the watermark passes + 100_000, where is the watermark at the time when you set the timer. Does the watermark advance that far? Cheers, Aljoscha On Thu, 8 Dec 2016 at 01:51 Sendoh wrote: > Hi Flink users, > > Can I ask is my understanding of onEventTime()

RE: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-07 Thread Radu Tudoran
Hi, I think the idea of having such a monthly thread is very good and it might even help to further attract new people in the community. In the same time I do not think that 1 extra mail per month is necessary a spam ☺ In the same time – we can also consider a jobs@flink mailing list Dr. Radu

Re: Serializers and Schemas

2016-12-07 Thread milind parikh
Why not use a self-describing format (json), stream as String and read through a json reader and avoid top-level reflection? Github.com/milindparikh/streamingsi https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-simulated-data-no-cdc-advanced-eventing/2-dataprocessing ?

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-07 Thread Kanstantsin Kamkou
Is it possible to avoid such a spam here? If I need a new job, I could search it. The same way I might want to subscribe to a different thread, like jobs@flink. * The idea itself is great. On Tue, 6 Dec 2016 at 14:04, Kostas Tzoumas wrote: > yes, of course! > > On Tue, Dec 6, 2016 at 12:54 PM, M

Partitioning operator state

2016-12-07 Thread Dominik Safaric
Hi everyone, In the case of scaling out a Flink cluster, how does Flink handle operator state partitioning of a staged topology? Regards, Dominik

Re: Serializers and Schemas

2016-12-07 Thread Matt
I've read your example, but I've found the same problem. You're serializing your POJO as a string, where all fields are separated by "\t". This may work for you, but not in general. https://github.com/luigiselmi/pilot-sc4-fcd-producer/blob/master/src/main/java/eu/bde/sc4pilot/fcd/FcdTaxiEvent.java

Re: How to let 1.1.3 not drop late events as verion 1.0.3 does

2016-12-07 Thread Sendoh
Thanks, I follow your suggestion and it works as we expected. Best, Sendoh -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-let-1-1-3-not-drop-late-events-as-verion-1-0-3-does-tp10349p10510.html Sent from the Apache Flink User Mailing

onEventTime() is not called after setting ctx.registerEventTimeTimer(timstamp)

2016-12-07 Thread Sendoh
Hi Flink users, Can I ask is my understanding of onEventTime() correct? In my custom trigger, I have sth as follows: onElement(JSONObject element, long timestamp, W window, TriggerContext ctx){ if(count == 3) { ctx.registerEventTimeTimer(ctx.getWatermark+10); return Trigger

Multiple Window Operators

2016-12-07 Thread Nico
Hi, I am a little bit confused regarding the windows in Flink. Is it possible to use multiple window operators in a single flink job? In my example I receive events every 5s, which need to be edited before further investigation. For this I use a keyBy(ID) followed by a sliding Count-Window (2,1)..

RE: Collect() freeze on yarn cluster on strange recover/deserialization error

2016-12-07 Thread LINZ, Arnaud
Hi, Any news? It's maybe caused by an oversized akka payload (many akka.remote.OversizedPayloadException: Discarding oversized payload sent to Actor[akka.tcp://flink@172.21.125.20:39449/user/jobmanager#-1264474132]: max allowed size 10485760 bytes, actual size of encoded class org.apache.flink

Re: Recursive directory traversal with TextInputFormat

2016-12-07 Thread Stefan Richter
Hi, I think there is a field in FileInputFormat (which TextInputFormat is subclassing) that could serve your purpose if you override the default: /** * The flag to specify whether recursive traversal of the input directory * structure is enabled. */ protected boolean enumerateNestedFiles = fa

Re: Parallelism and stateful mapping with Flink

2016-12-07 Thread Andrew Roberts
Sure! (Aside, it turns out that the issue was using an `Array[Byte]` as a key - byte arrays don’t appear to have a stable hashCode. I’ll provide the skeleton for fullness, though.) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(Config.callAggregator.parallelism)

Re: Parallelism and stateful mapping with Flink

2016-12-07 Thread Stefan Richter
Hi, could you maybe provide the (minimal) code for the problematic job? Also, are you sure that the keyBy is working on the correct key attribute? Best, Stefan > Am 07.12.2016 um 15:57 schrieb Andrew Roberts : > > Hello, > > I’m trying to perform a stateful mapping of some objects coming in f

Re: Serializers and Schemas

2016-12-07 Thread Luigi Selmi
Hi Matt, I had the same problem, trying to read some records in event time using a POJO, doing some transformation and save the result into Kafka for further processing. I am not yet done but maybe the code I wrote starting from the Flink Forward 2016 training docs

Serializers and Schemas

2016-12-07 Thread Matt
Hello, I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important. So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Fl

Parallelism and stateful mapping with Flink

2016-12-07 Thread Andrew Roberts
Hello, I’m trying to perform a stateful mapping of some objects coming in from Kafka in a parallelized flink job (set on the job using env.setParallelism(3)). The data source is a kafka topic, but the partitions aren’t meaningfully keyed for this operation (each kafka message is flatMapped to b

Re: Replace Flink job while cluster is down

2016-12-07 Thread Stefan Richter
Hi, first a few quick questions: I assume you are running in HA mode, right? Also what version of Flink are you running? In case you are not running HA, nothing is automatically recovered. With HA, you would need to manually remove the corresponding entry from Zookeeper. If this is the problem

Replace Flink job while cluster is down

2016-12-07 Thread Al-Isawi Rami
Hi, I have faulty flink streaming program running on a cluster that is consuming from kafka,so I brought the cluster down. Now I have a new version that has the fix. Now if I bring up the flink cluster again, the old faulty program will be recovered and it will consume and stream faulty results

Re: YARN Reserved Memory

2016-12-07 Thread Michael Pisula
Hi Stefan, thanks for the fast feedback. Updating to a newer YARN Version is most certainly something that would benefit us in many different areas (the issues with the HA mode being the most important of them), however at the moment we are not able to update to a newer version. If that is another

Recursive directory traversal with TextInputFormat

2016-12-07 Thread Lukas Kircher
Hi all, I am trying to read nested .csv files from a directory and want to switch from a custom SourceFunction I implemented to the TextInputFormat. I have two questions: 1) Somehow only the file in the root directory is processed, nested files are skipped. What am I missing? See the attachmen

Re: Flink Kafka producer with a topic per message

2016-12-07 Thread Sanne de Roever
Hi Gordon, Yes, this has been addressed in 1.0.0; and in a very nice way. Thank you. Cheers, Sanne On Wed, Dec 7, 2016 at 11:11 AM, Sanne de Roever wrote: > Hi Gordon, > > Sounds very close, I will have look; thx. > > Cheers, > > Sanne > > On Wed, Dec 7, 2016 at 11:09 AM, Tzu-Li (Gordon) Ta

Re: YARN Reserved Memory

2016-12-07 Thread Stefan Richter
Hi, did you observe the problem only under YARN 2.4.0? IIRC this version of YARN has some problems that can also lead to issues with Flink’s HA mode, and I would encourage you to upgrade YARN to 2.5 or higher. On a different note, there have been several improvements that we will release in Fli

Re: Flink Kafka producer with a topic per message

2016-12-07 Thread Sanne de Roever
Hi Gordon, Sounds very close, I will have look; thx. Cheers, Sanne On Wed, Dec 7, 2016 at 11:09 AM, Tzu-Li (Gordon) Tai wrote: > Hi, > > The FlinkKafkaProducers currently support per record topic through the > user-provided serialization schema which has a "getTargetTopic(T > element)" method

Re: Flink Kafka producer with a topic per message

2016-12-07 Thread Tzu-Li (Gordon) Tai
Hi, The FlinkKafkaProducers currently support per record topic through the user-provided serialization schema which has a "getTargetTopic(T element)" method called for per record, and also decides the partition the record will be sent to through a custom KafkaPartitioner, which is also provided 

Re: Flink Kafka producer with a topic per message

2016-12-07 Thread Sanne de Roever
Having a had a glass of water, the following option came up. Having more advanced Sink integrations is likely to be a more general concern. It would be better to have a more smooth path from the cleaner abstraction to the advanced case. A more general proposal would be to alter the Sink interface

Re: Flink Kafka producer with a topic per message

2016-12-07 Thread Sanne de Roever
A first sketch Central to this functionality is Kafka's ProducerRecord. ProducerRecord was introduced for Kafka 0.8. This means that any functionality could be introduced for all Flink-Kafka connectors; as per https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/connectors/ka

Re: Flink Kafka producer with a topic per message

2016-12-07 Thread Sanne de Roever
Good questions, I will follow up piece-wise to address the different questions. Could a Wiki section be an idea, before I spread the information across several posts? On Tue, Dec 6, 2016 at 4:50 PM, Stephan Ewen wrote: > You are right, it does not exist, and it would be a nice addition. > > Can

YARN Reserved Memory

2016-12-07 Thread Michael Pisula
Hi Guys, We are having a slight issue using Flink 1.1.3 (we also observed the problem with 1.0.2) in Yarn 2.4.0. Whenever a TaskManager restarts, YARN seems to reserve memory during the TaskManager restart, and not free the memory again. We are using a CapacityScheduler with 2 queues, where the qu