Re: HBase write problem

2016-05-12 Thread Palle
Hi guys. Thanks for helping out. We downgraded to HBase 0.98 and resolved some classpath issues and then it worked. /Palle - Original meddelelse - > Fra: Stephan Ewen > Til: user@flink.apache.org > Dato: Ons, 11. maj 2016 17:19 > Emne: Re: HBase write problem > > Just to narrow down

Re: Flink + Avro GenericRecord - first field value overwrites all other fields

2016-05-12 Thread Tarandeep Singh
I think I found a workaround. Instead of reading Avro files as GenericRecords, if I read them as specific records and then use a map to convert (typecast) them as GenericRecord, the problem goes away. I ran some tests and so far this workaround seems to be working in my local setup. -Tarandeep O

[ANNOUNCE] Flink 1.0.3 Released

2016-05-12 Thread Ufuk Celebi
The Flink PMC is pleased to announce the availability of Flink 1.0.3. The official release announcement: http://flink.apache.org/news/2016/05/11/release-1.0.3.html Release binaries: http://apache.openmirror.de/flink/flink-1.0.3/ Please update your Maven dependencies to the new 1.0.3 version and

checkpoints not being removed from HDFS

2016-05-12 Thread Maciek Próchniak
Hi, we have stream job with quite large state (few GB), we're using FSStateBackend and we're storing checkpoints in hdfs. What we observe is that v. often old checkpoints are not discarded properly. In hadoop logs I can see: 2016-05-10 12:21:06,559 INFO BlockStateChange: BLOCK* addToInvalidat

normalize vertex values

2016-05-12 Thread Lydia Ickler
Hi all, If I have a Graph g: Graph g and I would like to normalize all vertex values by the absolute max of all vertex values -> what API function would I choose? Thanks in advance! Lydia

Re: HBase write problem

2016-05-12 Thread Flavio Pompermaier
Great :) On Thu, May 12, 2016 at 10:01 AM, Palle wrote: > Hi guys. > > Thanks for helping out. > > We downgraded to HBase 0.98 and resolved some classpath issues and then it > worked. > > /Palle > > - Original meddelelse - > > *Fra:* Stephan Ewen > *Til:* user@flink.apache.org > *Dato:*

Re: reading from latest kafka offset when flink starts

2016-05-12 Thread Balaji Rajagopalan
No I am using 0.8.0.2 kafka. I did some experiments with changing the parallelism from 4 to 16 now the lag has reduced to 20 min from 2 hours, the cpu utilization (load avg) has gone up from 20-30 % to 50-60 % , so parallelism does seem to play a role in reducing the processing lag in flink as I e

Re: Bug while using Table API

2016-05-12 Thread Simone Robutti
Ok, I tested it and it works on the same example. :) 2016-05-11 12:25 GMT+02:00 Vasiliki Kalavri : > Hi Simone, > > Fabian has pushed a fix for the streaming TableSources that removed the > Calcite Stream rules [1]. > The reported error does not appear anymore with the current master. Could > you

Unexpected behaviour in datastream.broadcast()

2016-05-12 Thread Biplob Biswas
Hi, I am running this following sample code to understand how iteration and broadcast works in streaming context. final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(4); long i = 5; Data

Re: get start and end time stamp from time window

2016-05-12 Thread Fabian Hueske
Hi Martin, You can use a FoldFunction and a WindowFunction to process the same! window. The FoldFunction is eagerly applied, so the window state is only one element. When the window is closed, the aggregated element is given to the WindowFunction where you can add start and end time. The iterator

Re: Flink + Avro GenericRecord - first field value overwrites all other fields

2016-05-12 Thread Fabian Hueske
Hi Tarandeep, the AvroInputFormat was recently extended to support GenericRecords. [1] You could also try to run the latest SNAPSHOT version and see if it works for you. Cheers, Fabian [1] https://issues.apache.org/jira/browse/FLINK-3691 2016-05-12 10:05 GMT+02:00 Tarandeep Singh : > I think I

Re: Force triggering events on watermark

2016-05-12 Thread Aljoscha Krettek
Yes, this should work. On Tue, 10 May 2016 at 19:01 Srikanth wrote: > Yes, will work. > I was trying another route of having a "finalize & purge trigger" that will >i) onElement - Register for event time watermark but not alter nested > trigger's TriggerResult > ii) OnEventTime - Always pu

Re: Bug while using Table API

2016-05-12 Thread Vasiliki Kalavri
Good to know :) On 12 May 2016 at 11:16, Simone Robutti wrote: > Ok, I tested it and it works on the same example. :) > > 2016-05-11 12:25 GMT+02:00 Vasiliki Kalavri : > >> Hi Simone, >> >> Fabian has pushed a fix for the streaming TableSources that removed the >> Calcite Stream rules [1]. >> Th

Re: synchronizing two streams

2016-05-12 Thread Matthias J. Sax
I cannot follow completely. TwoInputStreamOperators defines two methods to process watermarks for each stream. So you can sync both stream within your outer join operator you plan to implement. -Matthias On 05/11/2016 05:00 PM, Alexander Gryzlov wrote: > Hello, > > We're implementing a streamin

Re: synchronizing two streams

2016-05-12 Thread Alexander Gryzlov
Hmm, probably I don't really get how Flink's execution model works. As far as I understand, the preferred way to throttle down stream consumption is to simply have an operator with a conditional Thread.sleep() inside. Wouldn't calling sleep() in either of TwoInputStreamOperator's processWatermarkN(

Re: synchronizing two streams

2016-05-12 Thread Matthias J. Sax
That is correct. But there is no reason to throttle an input stream. If you implements an Outer-Join you will have two in-memory buffers holding the record of each stream of your "time window". Each time you receive a watermark, you can remove all "expired" records from the buffer of the other str

Re: synchronizing two streams

2016-05-12 Thread Alexander Gryzlov
Yes, this is generally a viable design, and is actually something we started off with. The problem in our case is, however, that either of the streams can occasionally (due to external producer's issues) get stuck for an arbitrary period of time, up to several hours. Buffering the other one during

Re: synchronizing two streams

2016-05-12 Thread Matthias J. Sax
I see. But even if you would have an operator (A,B)->(A,B), it would not be possible to block A if B does not deliver any data, because of Flink's internal design. You will need to use an custom solution: something like to a map (one for each steam) that use an side-communication channel (ie, exte

Re: checkpoints not being removed from HDFS

2016-05-12 Thread Ufuk Celebi
Hey Maciek, thanks for reporting this. Having files linger around looks like a bug to me. The idea behind having the recursive flag set to false in the AbstractFileStateHandle.discardState() call is that the FileStateHandle is actually just a single file and not a directory. The second call tryin

Re: get start and end time stamp from time window

2016-05-12 Thread Martin Neumann
Thanks for the help. I use a Fold and a WindowFunction in conjunction now and it works fine. Though I wish there would be a less complicated way to do this. cheers Martin On Thu, May 12, 2016 at 11:59 AM, Fabian Hueske wrote: > Hi Martin, > > You can use a FoldFunction and a WindowFunction to p

Re: normalize vertex values

2016-05-12 Thread Vasiliki Kalavri
Hi Lydia, there is no dedicated Gelly API method that performs normalization. If you know the max value, then a mapVertices() would suffice. Otherwise, you can get the Dataset of vertices with getVertices() and apply any kind of operation supported by the Dataset API on it. Best, -Vasia. On May 1

Re: checkpoints not being removed from HDFS

2016-05-12 Thread Ufuk Celebi
The issue is here: https://issues.apache.org/jira/browse/FLINK-3902 (My "explanation" before dosn't make sense actually and I don't see a reason why this should be related to having many state handles.) On Thu, May 12, 2016 at 3:54 PM, Ufuk Celebi wrote: > Hey Maciek, > > thanks for reporting th

Java heap space error

2016-05-12 Thread Flavio Pompermaier
Hi to all, running a job that writes parquet-thrift files I had this exception (in a Task Manager): io.netty.channel.nio.NioEventLoop - Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space 2016-05-12 18:49:11,302 WARN org.jboss.netty.ch

Re: How to measure Flink performance

2016-05-12 Thread prateekarora
Hi How can i measure throughput and latency of my application in flink 1.0.2 ? Regards Prateek -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html Sent from the Apache Flink User Mailing List

Re: How to measure Flink performance

2016-05-12 Thread Konstantin Knauf
Hi Prateek, regarding throughput, what about simply filling the input Kafka topic with some (a lot) of messages and monitor (e.g. http://quantifind.github.io/KafkaOffsetMonitor/) how quickly Flink can work the lag off. The messages should be representative of your use case, of course. Latency is

Scatter-Gather Iteration aggregators

2016-05-12 Thread Lydia Ickler
Hi, I have a question regarding the Aggregators of a Scatter-Gather Iteration. Is it possible to have a global aggregator that is accessible in VertexUpdateFunction() and MessagingFunction() at the same time? Thanks in advance, Lydia

Barriers at work

2016-05-12 Thread Srikanth
Hello, I was reading about Flink's checkpoint and wanted to check if I correctly understood the usage of barriers for exactly once processing. 1) Operator does alignment by buffering records coming after a barrier until it receives barrier from all upstream operators instances. 2) Barrier is alw

Re: checkpoints not being removed from HDFS

2016-05-12 Thread Maciek Próchniak
thanks, I'll try to reproduce it in some test by myself... maciek On 12/05/2016 18:39, Ufuk Celebi wrote: The issue is here: https://issues.apache.org/jira/browse/FLINK-3902 (My "explanation" before dosn't make sense actually and I don't see a reason why this should be related to having many s

Re: Local Cluster have problem with connect to elasticsearch

2016-05-12 Thread rafal green
Hi Gordon, Thanks for advice - it's work perfect but only in elasticsearch case. This pom version works for elasticsearch 2.2.1. org.apache.flink flink-connector-elasticsearch2_${scala.version} 1.1-SNAPSHOT jar false ${project.build.directory}/classes org/apache/flink/**

Re: Does Kafka connector leverage Kafka message keys?

2016-05-12 Thread Krzysztof Zarzycki
If I can throw in my 2 cents, I agree with what Elias says. Without that feature (not partitioning already partitioned Kafka data), Flink is in bad position for common simpler processing, that don't involve shuffling at all, for example simple readKafka-enrich-writeKafka . The systems like the new

Interesting window behavior with savepoints

2016-05-12 Thread Andrew Whitaker
Hi, I was recently experimenting with savepoints and various situations in which they succeed or fail. I expected this example to fail: https://gist.github.com/AndrewWhitaker/fa46db04066ea673fe0eda232f0a5ce1 Basically, the first program runs with a count window. The second program is identical e

Re: Interesting window behavior with savepoints

2016-05-12 Thread Andrew Whitaker
"Flink can't successfully restore a checkpoint" should be "Flink can't successfully restore a savepoint". On Thu, May 12, 2016 at 3:44 PM, Andrew Whitaker < andrew.whita...@braintreepayments.com> wrote: > Hi, > > I was recently experimenting with savepoints and various situations in > which they

Re: Interesting window behavior with savepoints

2016-05-12 Thread Ufuk Celebi
On Thu, May 12, 2016 at 10:44 PM, Andrew Whitaker wrote: > From what I've observed, most of the time when Flink can't successfully > restore a checkpoint it throws an exception saying as much. I was expecting > to see that behavior here. Could someone explain why this "works" (as in, > flink accep

Re: Local Cluster have problem with connect to elasticsearch

2016-05-12 Thread rafal green
...btw I found this (in folder: "flink/flink-streaming-connectors/flink-connector-elasticsearch2.pom.xml") : 2.2.1 I change it to 2.3.2 version and of course rebuild with that command "mvn clean install -DskipTests" ...but nothing is changed. 2016-05-12 22:39 GMT+02:00 rafal green : > Sorr

Confusion about multiple use of one ValueState

2016-05-12 Thread Nirmalya Sengupta
Hello all, Let's say I want to hold some state value derived during one transformation, and then use that same state value in a subsequent transformation? For example: myStream .keyBy(fieldID) // Some field ID, may be 0 .map(new MyStatefulMapper()) .map(new MySubsequentMapper()) Now, I defi

Re: Confusion about multiple use of one ValueState

2016-05-12 Thread Balaji Rajagopalan
I don't think the valuestate defined in one map function is accessible in other map function this is my understanding, also you need to be aware there will be instance of map function created for each of your tuple in your stream, I had a similar use case where I had to pass in some state from one

Re: Scatter-Gather Iteration aggregators

2016-05-12 Thread Vasiliki Kalavri
Hi Lydia, registered aggregators through the ScatterGatherConfiguration are accessible both in the VertexUpdateFunction and in the MessageFunction. Cheers, -Vasia. On 12 May 2016 at 20:08, Lydia Ickler wrote: > Hi, > > I have a question regarding the Aggregators of a Scatter-Gather Iteration.

Re: How to measure Flink performance

2016-05-12 Thread Dhruv Gohil
Hi Prateek, https://github.com/dataArtisans/yahoo-streaming-benchmark/blob/master/flink-benchmarks/src/main/java/flink/benchmark/utils/ThroughputLogger.java https://github.com/dataArtisans/yahoo-streaming-benchmark/blob/master/flink-benchmarks/src/main/java/flink/benchmark/utils/AnalyzeTool.java

Re: Scatter-Gather Iteration aggregators

2016-05-12 Thread Lydia Ickler
Hi Vasia, yes, but only independently within each Function or not? If I set the aggregator in VertexUpdateFunction then the newly set value is not visible in the MessageFunction. Or am I doing something wrong? I would like to have a shared aggregator to normalize vertices. > Am 13.05.2016 um