Checkpoint and restore states

2016-04-19 Thread Jack Huang
Hi all, I am doing a simple word count example and want to checkpoint the accumulated word counts. I am not having any luck getting the counts saved and restored. Can someone help? env.enableCheckpointing(1000) env.setStateBackend(new MemoryStateBackend()) > ... inStream > .keyBy({s

Re: Flink + S3

2016-04-19 Thread Michael-Keith Bernard
Hey Till & Ufuk, We're running on self-managed EC2 instances (and we'll eventually have a mirror cluster in our colo). The provided documentation notes that for Hadoop 2.6, we'd need such-and-such version of hadoop-aws and guice on the CP. If I wanted to instead use Hadoop 2.7, which versions

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
If the data exceeds the main memory of your machine, then you should use the RocksDBStateBackend as a state backend. It allows you to store state (including windows) on disk. Thus, the size of state you can store is then limited by your hard disk capacity. If the expected data size can be kept in

Re: YARN session application attempts

2016-04-19 Thread Till Rohrmann
Hi Stefano, Hadoop supports this feature since version 2.6.0. You can define a time interval for the maximum number of applications attempt. This means that you have to observe this number of application failures in a time interval before failing the application ultimately. Flink will activate

Re: FoldFunction accumulator checkpointing

2016-04-19 Thread Ron Crocker
Aljoscha - I want to use a RichFoldFunction to get the open() hook. I cheat and use this structure instead with a (non-Rich) FoldFunction: public class InfinitResponseFilterFolder implements FoldFunction, String> { private BackingStore backingStore; @Override

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-19 Thread Till Rohrmann
Hi Norman, sorry for the late reply. I finally found time and could, thanks to you, reproduce the problem. The problem was that the window borders were treated differently in two parts of the code. Now the left border of a window is inclusive and the right border (late elements) is exclusive.

Re: Master (1.1-SNAPSHOT) Can't run on YARN

2016-04-19 Thread Ufuk Celebi
Hey Stefano, Flink's resource management has been refactored for 1.1 recently. This could be a regression introduced by this. Max can probably help you with more details. Is this currently a blocker for you? – Ufuk On Tue, Apr 19, 2016 at 6:31 PM, Stefano Baghino

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Yifei Li
Hi Till and Aljoscha, Thank you so much for your suggestions and I'll try them out. I have another question. Since S2 my be days delayed, so there are may be lots of windows and large amount of data stored in memory waiting for computation. How does Flink deal with that? Thanks, Yifei On Tue,

Re: ClasNotFound when submitting job from command line

2016-04-19 Thread Balaji Rajagopalan
In your pom.xml add the maven.plugins like this, and you will have to add all the dependent artifacts, this works for me, if you fire mvn clean compile package, the created jar is a fat jar. org.apache.maven.plugins maven-dependency-plugin 2.9

RE: ClasNotFound when submitting job from command line

2016-04-19 Thread Radu Tudoran
Hi, In my case the root cause for this was mainly that I was using eclipse to package the jar. Try using mvn instead. Additioanlly you can copy the dependency jars in the lib of the task managers and restart them Dr. Radu Tudoran Research Engineer - Big Data Expert IT R Division

ClasNotFound when submitting job from command line

2016-04-19 Thread Flavio Pompermaier
Hi to all, I just tied to dubmit my application to the Flink cluster (1.0.1) but I get ClassNotFound exceptions for classes inside my shaded jar (like oracle.jdbc.OracleDriver or org.apache.commons.pool2.PooledObjectFactory). Those classes are in the shaded jar but aren't found. If I put the jars

Re: logback.xml and logback-yarn.xml rollingpolicy configuration

2016-04-19 Thread Till Rohrmann
Have you made sure that Flink is using logback [1]? [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#using-logback-instead-of-log4j Cheers, Till On Tue, Apr 19, 2016 at 2:01 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > The are two files

Re: adding source not serializable exception in streaming implementation

2016-04-19 Thread Till Rohrmann
I assume that the provided FetchStock code is not complete. As the exception indicates, you somehow store a LocalStreamEnvironment in you source function. The StreamExecutionEnvironments are not serializable and cannot be part of the source function’s closure. Cheers, Till ​ On Tue, Apr 19, 2016

Re: Sink Parallelism

2016-04-19 Thread Chesnay Schepler
The picture you reference does not really show how dataflows are connected. For a better picture, visit this link: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows Let me know if this doesn't answer your question. On 19.04.2016 14:22,

Re: Flink on Yarn - ApplicationMaster command

2016-04-19 Thread Maximilian Michels
Hi Theofilos, I'm not sure whether I understand correctly what you are trying to do. I'm assuming you don't want to use the command-line client. You can setup the Yarn cluster in your code manually using the FlinkYarnClient class. The deploy() method will give you a FlinkYarnCluster which you

Flink on Yarn - ApplicationMaster command

2016-04-19 Thread Theofilos Kakantousis
Hi everyone, I'm using Flink 0.10.1 and hadoop 2.4.0 to implement a client that submits a flink application to Yarn. To keep it simple I use the ConnectedComponents app from flink examples. I set the required properties (Resources, AM ContainerLaunchContext etc.) on the YARN client

adding source not serializable exception in streaming implementation

2016-04-19 Thread subash basnet
Hello all, My requirement is to re-read the csv file from a file path at certain time intervals and process the csv data. The csv file gets updated at regular intervals. Below is my code: StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); *DataStream dataStream

Sink Parallelism

2016-04-19 Thread Ravinder Kaur
Hello All, Considering the following streaming dataflow of the example WordCount, I want to understand how Sink is parallelised. Source --> flatMap --> groupBy(), sum() --> Sink If I set the paralellism at runtime using -p, as shown here

Re: Leader not found

2016-04-19 Thread Robert Metzger
Can you provide me with the exact Flink and Kafka versions you are using and the steps to reproduce the issue? On Tue, Apr 19, 2016 at 2:06 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > It does not seem to fully work if there is no data in the kafka stream, > the flink

Re: Leader not found

2016-04-19 Thread Balaji Rajagopalan
It does not seem to fully work if there is no data in the kafka stream, the flink application emits this error and bails, could this be missed use case in the fix. On Tue, Apr 19, 2016 at 3:58 PM, Robert Metzger wrote: > Hi, > > I'm sorry, the documentation in the JIRA

logback.xml and logback-yarn.xml rollingpolicy configuration

2016-04-19 Thread Balaji Rajagopalan
The are two files in the /usr/share/flink/conf directory, and I was trying to do the rolling of application logs which goes to following directory in task nodes. /var/log/hadoop-yarn/containers/application_*/container_*/taskmanager.log out err Changing the logback.xml and logback-yarn.xml has

jobmanager.web.* properties for long running yarn session

2016-04-19 Thread Konstantin Knauf
Hi everyone, we are using a long running yarn session and changed jobmanager.web.checkpoints.history to 20. On the dashboard's job manager panel I can see the changed config, but the checkpoint history for the job still has only 10 entries. Are these properties only supported in stand-alone

Re: Leader not found

2016-04-19 Thread Robert Metzger
Hi, I'm sorry, the documentation in the JIRA issue is a bit incorrect. The issue has been fixed in all versions including and after 1.0.0. Earlier releases (0.10, 0.9) will fail when the leader changes. However, you don't necessarily need to upgrade to Flink 1.0.0 to resolve the issue: With

Leader not found

2016-04-19 Thread Balaji Rajagopalan
I am facing this exception repeatedly while trying to consume from kafka topic. It seems it was reported in 1.0.0 and fixed in 1.0.0, how can I be sure that is fixed in the version of flink that I am using, does it require me to install patch updates ? Caused by: java.lang.RuntimeException:

Re: class java.util.UUID is not a valid POJO type

2016-04-19 Thread Till Rohrmann
Hi Leonard, the UUID class cannot be treated as a POJO by Flink, because it is lacking the public getters and setters for mostSigBits and leastSigBits. However, it should be possible to treat it as a generic type. I think the difference is that you cannot use key expressions and key indices to

Re: Flink + Kafka + Scalabuff issue

2016-04-19 Thread Robert Metzger
Hi Alex, I suspect its a GC issue with the code generated by ScalaBuff. Can you maybe try to do something like a standalone test where use use a while(true) loop to see how fast you can deserialize elements from your Foo type? Maybe you'll find that the JVM is growing all the time. Then there's

Re: Turn off logging in Flink

2016-04-19 Thread Sendoh
Thank you! Totally works. Best, Sendoh -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Turn-off-logging-in-Flink-tp6196p6200.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Hash tables - joins, cogroup, deltaIteration

2016-04-19 Thread Fabian Hueske
Hi Ovidiu, Hash tables are currently used for joins (inner & outer) and the solution set of delta iterations. There is a pending PR that implements a hash table for partial aggregations (combiner) [1] which should be added soon. Joins (inner & outer) are already implemented as Hybrid Hash joins

Re: Turn off logging in Flink

2016-04-19 Thread Till Rohrmann
Hi Sendoh, you have to edit your log4j.properties file to set log4j.rootLogger=OFF in order to turn off the logger. Depending on how you run Flink and where you wanna turn off the logging, you either have to edit the log4j.properties file in the FLINK_HOME/conf directory or the in your project

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
Hi Yifei, if you don't wanna implement your own join operator, then you could also chain two join operations. I created a small example to demonstrate that: https://gist.github.com/tillrohrmann/c074b4eedb9deaf9c8ca2a5e124800f3. However, bare in mind that for this approach you will construct two

Turn off logging in Flink

2016-04-19 Thread Sendoh
Hi, Can I ask how to turn off Flink logging to avoid seeing INFO? I have tried StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.getConfig().disableSysoutLogging(); env.execute() and Configuration env_config = new Configuration();

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Aljoscha Krettek
Hi, right now, there is no built-in support for n-ary joins. I am working on this, however. For now you can simulate n-ary joins by using a tagged union and doing the join yourself in a WindowFunction. I created a small example that demonstrates this:

Re: Flink + Kafka + Scalabuff issue

2016-04-19 Thread Ufuk Celebi
Hey Alex, (1) Which Flink version are you using for this? (2) Can you also get a heap dump after the job slows down? Slow downs like this are often caused by some component leaking memory, maybe in Flink, maybe the Scalabuff deserializer. Can you also share the Foo code? – Ufuk On Mon, Apr

Re: Flink + S3

2016-04-19 Thread Ufuk Celebi
Hey Michael-Keith, are you running self-managed EC2 instances or EMR? In addition to what Till said: We tried to document this here as well: https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#provide-s3-filesystem-dependency Does this help? You don't need to really install

Re: Flink + S3

2016-04-19 Thread Till Rohrmann
Hi Michael-Keith, you can use S3 as the checkpoint directory for the filesystem state backend. This means that whenever a checkpoint is performed the state data will be written to this directory. The same holds true for the zookeeper recovery storage directory. This directory will contain the