Re: Error joining with Python API

2016-08-16 Thread Ufuk Celebi
I think that this is actually a bug in Flink. I'm cc'ing Chesnay who originally contributed the Python API. He can probably tell whether this is a bug in the Python API or Flink ioperator side of things. ;) On Mon, Aug 15, 2016 at 10:14 PM, davis k wrote: > I've got an issue performing joins usin

Re: Updating stored window data

2016-08-16 Thread Ufuk Celebi
Hey Paul! I think the window content should not be updated. Still, from looking at some of the internal Flink code, it looks like the updates would affect the "data buffer" -- but I think that this is only true for some cases and does not hold in general. Cc'ing Aljoscha for a definite answer on th

Re: Updating stored window data

2016-08-16 Thread Aljoscha Krettek
Hi, the input elements to a window function should not be modified. Could you maybe achieve something using a Fold? Maybe if you went into a bit more details we could figure something out together. Cheers, Aljoscha On Tue, 16 Aug 2016 at 10:38 Ufuk Celebi wrote: > Hey Paul! I think the window c

flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-16 Thread Miroslav Gajdoš
Hi guys, i've run into some problems with flink/yarn. I try to deploy flink to our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but the yarn application does not even start, it goes from accepted to finished/failed. Yarn info on resourcemanager looks like this: User:  wa-flink Nam

Re: Ordering expectations of data

2016-08-16 Thread Ufuk Celebi
Your approach of using a CoFlatMap with a slowly changing second input (the config source) is a common pattern and a good choice for this. The ordering of events between the sources and the CoFlatMap should be maintained if the parallelism matches. There is no repartitioning going on (the CoFlatMa

Re: Performance issues - is my topology not setup properly?

2016-08-16 Thread Ufuk Celebi
Hey Jon! Thanks for sharing this. The blog post refers to each record in the stream as an event. The YARN command you've shared looks good, you could give the machines more memory, but I would not expect this to be the problem here. I would rather think that the sources are the bottleneck (but of

Re: Error joining with Python API

2016-08-16 Thread Chesnay Schepler
looks like a bug, will look into it. :) On 16.08.2016 10:29, Ufuk Celebi wrote: I think that this is actually a bug in Flink. I'm cc'ing Chesnay who originally contributed the Python API. He can probably tell whether this is a bug in the Python API or Flink ioperator side of things. ;) On Mon,

Re: Enriching events with data from external http resources

2016-08-16 Thread Ufuk Celebi
On Mon, Aug 15, 2016 at 8:52 PM, Maciek Próchniak wrote: > I know it's not really desired way of using flink and that it would be > better to keep data as state inside stream and have it updated by some join > operator, but for us it's a bit of overkill - what's more, we have many (not > so large)

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-16 Thread Ufuk Celebi
This could be a bug in Flink. Can you share the complete logs of the run? CC'ing Max who worked on the YARN client recently who might have an idea in which cases Flink would not set the context. On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš wrote: > Hi guys, > > i've run into some problems wi

Re: Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-16 Thread Robert Metzger
Hi Yassine, In Flink 1.2 we've added a new feature to the Kafka consumer, allowing you to extract timestamps and emitting watermarks per partition. The consumers now have the following method: public FlinkKafkaConsumerBase assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks assigner)

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-16 Thread Miroslav Gajdoš
Log from yarn session runner is here: http://pastebin.com/xW1W4HNP Our hadoop distribution is from cloudera, resourcenanager version: 2.6.0-cdh5.4.5, it runs in HA mode (there could be some redirecting on accessing resourcemanager and/or namenode to active one). Ufuk Celebi píše v Út 16. 08. 2016

Azure Blob Storage Connector

2016-08-16 Thread MIkkel Islay
Hello, I would like to access data in Azure blob storage from Flink, via the Azure storage HDFS-compatibility interface. That is feasible from Apache Drill, and I am thinking something similar should be doable from Flink. A documentation page on eternal storage connectors for Flink exist, but it w

Re: Azure Blob Storage Connector

2016-08-16 Thread Ufuk Celebi
You should be able to follow this: http://mail-archives.apache.org/mod_mbox/drill-user/201512.mbox/%3CCAAL5oQJQRgqO5LjhG_=YFLyHuZUNqEvm3VX3C=2d9uxnbto...@mail.gmail.com%3E It's similar to the AWS S3 config (https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html). Add the Azure JAR

TumblingEventTimeWindows with time characteristic set to 'ProcessingTime'

2016-08-16 Thread Al-Isawi Rami
Hi, Why this combination is not possible? even though I am setting "assignTimestampsAndWatermarks “ correctly on the DataStream. I would like Flink to be ticking on processing time, but also utilize the TumblingEventTimeWindows which is based on event time. It is not possible because of : java

Re: Azure Blob Storage Connector

2016-08-16 Thread MIkkel Islay
Hello Ufuk, Thanks for your swift reply. Those are essentially the steps I took for Drill. I am happy to report back with my success, or otherwise. Mikkel On Tue, Aug 16, 2016 at 12:40 PM, Ufuk Celebi wrote: > You should be able to follow this: > > http://mail-archives.apache.org/mod_mbox/dril

Cannot use WindowedStream.fold with EventTimeSessionWindows

2016-08-16 Thread Jack Huang
Hi all, I want to window a series of events using SessionWindow and use fold function to incrementally aggregate the result. events .keyBy(_.id) .window(EventTimeSessionWindows.withGap(Time.minutes(1))) .fold(new Session)(eventFolder) ​ However I get java.lang.UnsupportedOperationEx

Re: Enriching events with data from external http resources

2016-08-16 Thread Maciek Próchniak
Hi Ufuk, thanks for info - this is good news :) maciek On 16/08/2016 12:16, Ufuk Celebi wrote: On Mon, Aug 15, 2016 at 8:52 PM, Maciek Próchniak wrote: I know it's not really desired way of using flink and that it would be better to keep data as state inside stream and have it updated by so