Re: Classloader issue using AvroParquetInputFormat via HadoopInputFormat

2016-08-08 Thread Shannon Carey
Correction: I cannot work around the problem. If I exclude hadoop1, I get the following exception which appears to be due to flink-java-1.1.0's dependency on Hadoop1. Failed to submit job 4b6366d101877d38ef33454acc6ca500 (com.expedia.www.flink.jobs.DestinationCountsHistoryJob$) org.apache.flink

Classloader issue using AvroParquetInputFormat via HadoopInputFormat

2016-08-08 Thread Shannon Carey
Hi folks, congrats on 1.1.0! FYI, after updating to Flink 1.1.0 I get the exception at bottom when attempting to run a job that uses AvroParquetInputFormat wrapped in a Flink HadoopInputFormat. The ContextUtil.java:71 is trying to execute: Class.forName("org.apache.hadoop.mapreduce.task.JobCont

Re: Kafka producer connector

2016-08-08 Thread Robert Metzger
Hi Janardhan, the fixed partitioner is the only one shipped with Flink. However, it should be fairly simple to implement one that uses the key to determine the partition. On Mon, Aug 8, 2016 at 7:16 PM, Janardhan Reddy wrote: > Hi, > The Flink kafka producer uses fixed partitioner by default an

Re: OutOfMemoryError

2016-08-08 Thread Paulo Cezar
Thanks Stephan, I had a MapFunction using Unirest and that was the origin of the leak. On Tue, Aug 2, 2016 at 7:36 AM, Stephan Ewen wrote: > My guess would be that you have a thread leak in the user code. > More memory will not solve the problem, only push it a bit further away. > > On Mon, Aug

Re: Flink Kafka Consumer Behaviour

2016-08-08 Thread Robert Metzger
Hi Prabhu, I'm pretty sure that the Kafka 09 consumer commits offsets to Kafka when checkpointing is turned on. In the FlinkKafkaConsumerBase.notifyCheckpointComplete(), we call fetcher.commitSpecificOffsetsToKafka(checkpointOffsets);, which calls this.consumer.commitSync(offsetsToCommit); in Kaf

Re: Flink kafka group question

2016-08-08 Thread Robert Metzger
Hi, you can get the offsets (current and committed offsets) in Flink 1.1 using the Flink metrics. In Flink 1.0, we expose the Kafka internal metrics via the accumulator system (so you can access them from the web interface as well). IIRC, Kafka exposes a metric for the lag as well. On Mon, Aug 8,

Re: Flink Kafka Consumer Behaviour

2016-08-08 Thread vpra...@gmail.com
Hi Stephan, The flink kafka 09 connector does not do offset commits to kafka when checkpointing is turned on. Is there a way to monitor the offset lag in this case, I am turning on a flink job that reads data from kafka (has about a week data - around 7 TB) , currently the approximate way that I

Re: Flink kafka group question

2016-08-08 Thread vpra...@gmail.com
>From the code in Kafka09Fetcher.java // if checkpointing is enabled, we are not automatically committing to Kafka. kafkaProperties.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, Boolean.toString(!runtimeContext.isCheckpointingEnabled())); If flink checkpointing

Re: [ANNOUNCE] Flink 1.1.0 Released

2016-08-08 Thread Henry Saputra
Great work all. Great Thanks to Ufuk as RE :) On Monday, August 8, 2016, Stephan Ewen wrote: > Great work indeed, and big thanks, Ufuk! > > On Mon, Aug 8, 2016 at 6:55 PM, Vasiliki Kalavri < > vasilikikala...@gmail.com > > wrote: > > > yoo-hoo finally announced 🎉 > > Thanks for managing the rele

Re: [ANNOUNCE] Flink 1.1.0 Released

2016-08-08 Thread Stephan Ewen
Great work indeed, and big thanks, Ufuk! On Mon, Aug 8, 2016 at 6:55 PM, Vasiliki Kalavri wrote: > yoo-hoo finally announced 🎉 > Thanks for managing the release Ufuk! > > On 8 August 2016 at 18:36, Ufuk Celebi wrote: > > > The Flink PMC is pleased to announce the availability of Flink 1.1.0. >

Re: Flink 1.1 event-time windowing changes from 1.0.3

2016-08-08 Thread Adam Warski
Thanks! I’ll be watching that issue then Adam > On 08 Aug 2016, at 05:01, Aljoscha Krettek wrote: > > Hi Adam, > sorry for the inconvenience. This is caused by a new file read operator, > specifically how it treats watermarks/timestamps. I opened an issue here that > describes the situation:

Kafka producer connector

2016-08-08 Thread Janardhan Reddy
Hi, The Flink kafka producer uses fixed partitioner by default and all our events are ending up in 1-2 partitions. Is there any inbuilt partitioner which distributes keys such that same key maps to same partition. Thanks

Re: [ANNOUNCE] Flink 1.1.0 Released

2016-08-08 Thread Vasiliki Kalavri
yoo-hoo finally announced 🎉 Thanks for managing the release Ufuk! On 8 August 2016 at 18:36, Ufuk Celebi wrote: > The Flink PMC is pleased to announce the availability of Flink 1.1.0. > > On behalf of the PMC, I would like to thank everybody who contributed > to the release. > > The release anno

[ANNOUNCE] Flink 1.1.0 Released

2016-08-08 Thread Ufuk Celebi
The Flink PMC is pleased to announce the availability of Flink 1.1.0. On behalf of the PMC, I would like to thank everybody who contributed to the release. The release announcement: http://flink.apache.org/news/2016/08/08/release-1.1.0.html Release binaries: http://apache.openmirror.de/flink/fli

Re: Having a single copy of an object read in a RichMapFunction

2016-08-08 Thread Theodore Vasiloudis
Thank you for the help Robert! Regarding the static field alternative you provided, I'm a bit confused about the difference between slots and instances. When you say that by using a static field it will be shared by all instances of the Map on the slot, does that mean that if the TM has multiple

Re: Using RabbitMQ Sinks

2016-08-08 Thread Robert Metzger
Hi Paul, the example in the code is outdated, StringToByteSerializer has probably been removed quite a while ago. I'll update the documentation once we figured out the other problem you reported. What's the exception you are getting? Regards, Robert On Mon, Aug 8, 2016 at 4:33 PM, Paul Joireman

Using RabbitMQ Sinks

2016-08-08 Thread Paul Joireman
Hi all, The documentation describing the use of RabbitMQ as a sink gives the following example: RMQConnectionConfig connectionConfig = new RMQConnectionConfig.Builder() .setHost("localhost").setPort(5000).setUserName(..) .setPassword(..).setVirtualHost("/").build(); stream.addSink(new RMQSink(

Re: Having a single copy of an object read in a RichMapFunction

2016-08-08 Thread Robert Metzger
Hi Theo, I think there are some variants you can try out for the problem. I think it depends a bit on the performance characteristics you expect: - The simplest variant is to run one TM per machine with one slot only. This is probably not feasible because you can't use all the CPU cores - ... to s

AW: max aggregator dosen't work as expected

2016-08-08 Thread Claudia Wegmann
So if I change the input data to (I added an uID value to identify the single data sets): new Data(1, 123, new Date(116, 8,8,11,11,11), 5), new Data(2, 123, new Date(116, 8,8,12,10,11), 8), new Data(3, 123, new Date(116, 8,8,12,12,11), 10), new Data(4, 123, new

Re: max aggregator dosen't work as expected

2016-08-08 Thread Robert Metzger
I have to admit that the difference between the two methods is subtle, and in my opinion it doesn't make much sense to have the two variants. - max() returns a tuple with the max value at the specified position, the other fields of the tuple/pojo are undefined - maxBy() returns a tuple with the ma

Re: Killing Yarn Session Leaves Lingering Flink Jobs

2016-08-08 Thread Konstantin Gregor
Hi Stephan, hi Ufuk, thank you very much for your insights, and sorry for the late reply, there was a lot going on recently. We finally figured out what the problem was: As you pointed out, the Flink job simply waited for new YARN resources. But when a new YARN session started, the Flink job did n

AW: max aggregator dosen't work as expected

2016-08-08 Thread Claudia Wegmann
OK, found my mistake reagarding question 2.). I key by the id value and gave all the data sets different values there. So of course all 4 data sets are printed. Sorry :) But question 1.) still remains. Von: Claudia Wegmann [mailto:c.wegm...@kasasi.de] Gesendet: Montag, 8. August 2016 14:27 An: u

max aggregator dosen't work as expected

2016-08-08 Thread Claudia Wegmann
Hey, I have some questions to aggregate functions such as max or min. Take the following example: //create Stream with event time where Data contains an ID, a timestamp and a temperature value DataStream oneStream = env.fromElements( new Data(123, new Date(116, 8,8,11,11,11), 5),

Re: Flink 1.1 event-time windowing changes from 1.0.3

2016-08-08 Thread Aljoscha Krettek
Hi Adam, sorry for the inconvenience. This is caused by a new file read operator, specifically how it treats watermarks/timestamps. I opened an issue here that describes the situation: https://issues.apache.org/jira/browse/FLINK-4329. I think this should be fixed for an upcoming 1.1.1 bug fixing r

Re: Output of KeyBy->TimeWindow->Reduce?

2016-08-08 Thread Aljoscha Krettek
Hi, what does the reduce function do exactly? Something like this? (a: String, b: String) -> b.toUppercase If yes, then I would expect a) to be the output you get. if it is this: (a: String, b: String) -> a + b.toUppercase then I would expect this: a,b,cC,d,eE,f,gG,h Cheers, Aljoscha On Sun,

Re: Flink timestamps

2016-08-08 Thread Aljoscha Krettek
Hi Davood, right now, you can only inspect the timestamps by writing a custom operator that you would use with DataStream.transform(). Measuring latency this way has some pitfalls, though. The timestamp might be assigned on a different machine than the machine that will process the tuple at the sin