Re: Standalone Cluster vs YARN

2015-11-25 Thread Ufuk Celebi
> On 25 Nov 2015, at 02:35, Welly Tambunan wrote: > > Hi All, > > I would like to know if there any feature differences between using > Standalone Cluster vs YARN ? > > Until now we are using Standalone cluster for our jobs. > Is there any added value for using YARN ? > > We don't have any

Re: Standalone Cluster vs YARN

2015-11-25 Thread Fabian Hueske
A strong argument for YARN mode can be the isolation of multiple users and jobs. You can easily start a new Flink cluster for each job or user. However, this comes at the price of resource (memory) fragmentation. YARN mode does not use memory as effective as cluster mode. 2015-11-25 9:46 GMT+01:00

Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Ufuk >In failure cases I find YARN more convenient, because it takes care of restarting failed task manager processes/containers for you. So this mean that we don't need zookeeper ? Cheers On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi wrote: > > On 25 Nov 2015, at 02:35, Welly Tambunan wr

Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Fabian, Interesting ! However YARN is still tightly couple to HDFS, is that seems wasteful to use only YARN without Hadoop ? Currently we are using Cassandra and CFS ( cass file system ) Cheers On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske wrote: > A strong argument for YARN mode can be

Re: Standalone Cluster vs YARN

2015-11-25 Thread Fabian Hueske
YARN is not a replacement for Zookeeper. Zookeeper is mandatory to run Flink in high-availability mode and takes care of leader (JobManager) election and meta-data persistance. With YARN, Flink can automatically start new Taskmanagers (and JobManagers) to compensate for failures. In cluster mode,

Re: Standalone Cluster vs YARN

2015-11-25 Thread Andreas Fritzler
Hi Welly, you will need Zookeeper if you want to setup the standalone cluster in HA mode. http://spark.apache.org/docs/latest/spark-standalone.html#high-availability In the YARN case you probably have already Zookeeper in place if you are running YARN in HA mode. Regards, Andreas On Wed, Nov 25

Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Fabian, This make sense now. I would like to avoid zookeeper if possible. Is there any way to avoid this to achieve HA ? I see that DataStax Enterprise achieve this availability for Spark Master without using Zookeeper. https://academy.datastax.com/demos/how-spark-master-high-availability-wor

Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Andreas, Yes, seems I can't avoid Zookeeper right now. It would be really nice if we can achieve HA via gossip protocol like Cassandra/Spark DSE does ? Is this possible ? Cheers On Wed, Nov 25, 2015 at 4:12 PM, Andreas Fritzler < andreas.fritz...@gmail.com> wrote: > Hi Welly, > > you will

Re: Standalone Cluster vs YARN

2015-11-25 Thread Maximilian Michels
Hi Welly, > However YARN is still tightly couple to HDFS, is that seems wasteful to use > only YARN without Hadoop ? I wouldn't say tightly coupled. You can use YARN without HDFS. To work with YARN properly, you would have to setup another distributed file system like xtreemfs. Or use the one pr

Re: Standalone Cluster vs YARN

2015-11-25 Thread Till Rohrmann
Hi Welly, at the moment Flink only supports HA via ZooKeeper. However, there is no limitation to use another system. The only requirement is that this system allows you to find a consensus among multiple participants and to retrieve the community decision. If this is possible, then it can be integ

Re: Standalone Cluster vs YARN

2015-11-25 Thread Andreas Fritzler
Hi Welly, If you want to use cassandra, you might want to look into having a Mesos cluster with frameworks for cassandra and spark. Regards, Andreas [1] http://spark.apache.org/docs/latest/running-on-mesos.html [2] https://github.com/mesosphere/cassandra-mesos On Wed, Nov 25, 2015 at 10:30 AM,

Re: Using Hadoop Input/Output formats

2015-11-25 Thread Stephan Ewen
For streaming, I am a bit torn whether reading a file will should have so many such prominent functions. Most streaming programs work on message queues, or on monitored directories. Not saying no, but not sure DataSet/DataStream parity is the main goal - they are for different use cases after all.

graph problem to be solved

2015-11-25 Thread RahadianBayu Permadi
Greetings, I am a newbie in this flink world. Thanks to Slim Baltagi for recommending this Flink community. I have a graph problem. So I have some points and paths among those points. Each path has some value like distance that determine the distance between two points it's connecting. So far it

store and retrieve Graph object

2015-11-25 Thread Stefanos Antaris
Hi to all, i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the

Working with State example /flink streaming

2015-11-25 Thread Lopez, Javier
Hi, We are trying to do a test using States but we have not been able to achieve our desired result. Basically we have a data stream with data as [{"id":"11","value":123}] and we want to calculate the sum of all values grouping by ID. We were able to achieve this using windows but not with states

Re: store and retrieve Graph object

2015-11-25 Thread Vasiliki Kalavri
Hi Stefane, let me know if I understand the problem correctly. The vertex values are POJOs that you're somehow inferring from the edge list and this value creation is what takes a lot of time? Since a graph is just a set of 2 datasets (vertices and edges), you could store the values to disk and ha

Re: Working with State example /flink streaming

2015-11-25 Thread Stephan Ewen
Hi Javier! You can solve this both using windows, or using manual state. What is better depends a bit on when you want to have the result (the sum). Do you want a result emitted after each update (or do some other operation with that value) or do you want only the final sum after a certain time?

Re: Working with State example /flink streaming

2015-11-25 Thread Maximilian Michels
Hi Javier, Thanks for your question. I've corrected the documentation (will be online soon). Cheers, Max On Wed, Nov 25, 2015 at 5:19 PM, Stephan Ewen wrote: > Hi Javier! > > You can solve this both using windows, or using manual state. > > What is better depends a bit on when you want to have

Re: Using Hadoop Input/Output formats

2015-11-25 Thread Robert Metzger
I agree with Stephan. Reading static files is quite uncommon with the DataStream API. Before We add such a method, we should add a convenience method for Kafka ;) But in general, I'm not a big fan of adding too many of these methods because they pull in so many external classes, which lead to brea

Re: store and retrieve Graph object

2015-11-25 Thread Stefanos Antaris
Hi Vasia, my graph object is the following: Graph graph = Graph.fromCollection(edgeList.collect(), env); The vertex is a POJO not the value. So the problem is how could i store and retrieve the vertex list? Thanks, Stefanos > On 25 Nov 2015, at 18:16, Vasiliki Kalavri wrote: > > Hi Stefa

[ANNOUNCE] CFP open for ApacheCon North America 2016

2015-11-25 Thread Rich Bowen
Community growth starts by talking with those interested in your project. ApacheCon North America is coming, are you? We are delighted to announce that the Call For Presentations (CFP) is now open for ApacheCon North America. You can submit your proposed sessions at http://events.linuxfoundation.o

Re: Running on a firewalled Yarn cluster?

2015-11-25 Thread Robert Metzger
Hi, I just wanted to let you know that I didn't forget about this! The BlobManager in 1.0-SNAPSHOT has already a configuration parameter to use a certain range of ports. I'm trying to add the same feature for YARN tomorrow. Sorry for the delay. On Tue, Nov 10, 2015 at 9:27 PM, Cory Monty wrote:

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-25 Thread Aljoscha Krettek
Hi Konstantin, I still didn’t come up with an explanation for the behavior. Could you maybe send me example code (and example data if it is necessary to reproduce the problem.)? This would really help me pinpoint the problem. Cheers, Aljoscha > On 17 Nov 2015, at 21:42, Konstantin Knauf > wrot

Re: store and retrieve Graph object

2015-11-25 Thread Vasiliki Kalavri
Hey, you can preprocess your data, create the vertices and store them to a file, like you would store any other Flink DataSet, e.g. with writeAsText. Then, you can create the graph by reading 2 datasets, like this: DataSet vertices = env.readTextFile("/path/to/vertices/")... // or your custom re

Re: [VOTE] Release Apache Flink 0.10.1 (release-0.10.0-rc1)

2015-11-25 Thread Henry Saputra
+1 LICENSE file looks good in source artifact NOTICE file looks good in source artifact Signature file looks good in source artifact Hash files looks good in source artifact No 3rd party executables in source artifact Source compiled All tests are passed Run standalone mode test app - Henry On M

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-25 Thread Konstantin Knauf
Hi Aljoscha, sure, will do. I have neither found a solution. I won't have time to put a minimal example together before the weekend though. Cheers, Konstantin On 25.11.2015 19:10, Aljoscha Krettek wrote: > Hi Konstantin, > I still didn’t come up with an explanation for the behavior. Could you m

Re: store and retrieve Graph object

2015-11-25 Thread Stefanos Antaris
Hi, It works fine using this approach. Thanks, Stefanos > On 25 Nov 2015, at 20:32, Vasiliki Kalavri wrote: > > Hey, > > you can preprocess your data, create the vertices and store them to a file, > like you would store any other Flink DataSet, e.g. with writeAsText. > > Then, you can crea

Re: store and retrieve Graph object

2015-11-25 Thread Vasiliki Kalavri
Good to know :) On 25 November 2015 at 21:44, Stefanos Antaris wrote: > Hi, > > It works fine using this approach. > > Thanks, > Stefanos > > On 25 Nov 2015, at 20:32, Vasiliki Kalavri > wrote: > > Hey, > > you can preprocess your data, create the vertices and store them to a > file, like you w