> On 25 Nov 2015, at 02:35, Welly Tambunan wrote:
>
> Hi All,
>
> I would like to know if there any feature differences between using
> Standalone Cluster vs YARN ?
>
> Until now we are using Standalone cluster for our jobs.
> Is there any added value for using YARN ?
>
> We don't have any
A strong argument for YARN mode can be the isolation of multiple users and
jobs. You can easily start a new Flink cluster for each job or user.
However, this comes at the price of resource (memory) fragmentation. YARN
mode does not use memory as effective as cluster mode.
2015-11-25 9:46 GMT+01:00
Hi Ufuk
>In failure cases I find YARN more convenient, because it takes care of
restarting failed task manager processes/containers for you.
So this mean that we don't need zookeeper ?
Cheers
On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi wrote:
> > On 25 Nov 2015, at 02:35, Welly Tambunan wr
Hi Fabian,
Interesting !
However YARN is still tightly couple to HDFS, is that seems wasteful to use
only YARN without Hadoop ?
Currently we are using Cassandra and CFS ( cass file system )
Cheers
On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske wrote:
> A strong argument for YARN mode can be
YARN is not a replacement for Zookeeper. Zookeeper is mandatory to run
Flink in high-availability mode and takes care of leader (JobManager)
election and meta-data persistance.
With YARN, Flink can automatically start new Taskmanagers (and JobManagers)
to compensate for failures. In cluster mode,
Hi Welly,
you will need Zookeeper if you want to setup the standalone cluster in HA
mode.
http://spark.apache.org/docs/latest/spark-standalone.html#high-availability
In the YARN case you probably have already Zookeeper in place if you are
running YARN in HA mode.
Regards,
Andreas
On Wed, Nov 25
Hi Fabian,
This make sense now.
I would like to avoid zookeeper if possible. Is there any way to avoid this
to achieve HA ?
I see that DataStax Enterprise achieve this availability for Spark Master
without using Zookeeper.
https://academy.datastax.com/demos/how-spark-master-high-availability-wor
Hi Andreas,
Yes, seems I can't avoid Zookeeper right now. It would be really nice if we
can achieve HA via gossip protocol like Cassandra/Spark DSE does ?
Is this possible ?
Cheers
On Wed, Nov 25, 2015 at 4:12 PM, Andreas Fritzler <
andreas.fritz...@gmail.com> wrote:
> Hi Welly,
>
> you will
Hi Welly,
> However YARN is still tightly couple to HDFS, is that seems wasteful to use
> only YARN without Hadoop ?
I wouldn't say tightly coupled. You can use YARN without HDFS. To work
with YARN properly, you would have to setup another distributed file
system like xtreemfs. Or use the one pr
Hi Welly,
at the moment Flink only supports HA via ZooKeeper. However, there is no
limitation to use another system. The only requirement is that this system
allows you to find a consensus among multiple participants and to retrieve
the community decision. If this is possible, then it can be integ
Hi Welly,
If you want to use cassandra, you might want to look into having a Mesos
cluster with frameworks for cassandra and spark.
Regards,
Andreas
[1] http://spark.apache.org/docs/latest/running-on-mesos.html
[2] https://github.com/mesosphere/cassandra-mesos
On Wed, Nov 25, 2015 at 10:30 AM,
For streaming, I am a bit torn whether reading a file will should have so
many such prominent functions. Most streaming programs work on message
queues, or on monitored directories.
Not saying no, but not sure DataSet/DataStream parity is the main goal -
they are for different use cases after all.
Greetings,
I am a newbie in this flink world. Thanks to Slim Baltagi for recommending
this Flink community.
I have a graph problem. So I have some points and paths among those points.
Each path has some value like distance that determine the distance between
two points it's connecting.
So far it
Hi to all,
i am working on a project with Gelly and i need to create a graph with billions
of nodes. Although i have the edge list, the node in the Graph needs to be a
POJO object, the construction of which takes long time in order to finally
create the final graph. Is it possible to store the
Hi,
We are trying to do a test using States but we have not been able to
achieve our desired result. Basically we have a data stream with data as
[{"id":"11","value":123}] and we want to calculate the sum of all values
grouping by ID. We were able to achieve this using windows but not with
states
Hi Stefane,
let me know if I understand the problem correctly. The vertex values are
POJOs that you're somehow inferring from the edge list and this value
creation is what takes a lot of time? Since a graph is just a set of 2
datasets (vertices and edges), you could store the values to disk and ha
Hi Javier!
You can solve this both using windows, or using manual state.
What is better depends a bit on when you want to have the result (the sum).
Do you want a result emitted after each update (or do some other operation
with that value) or do you want only the final sum after a certain time?
Hi Javier,
Thanks for your question. I've corrected the documentation (will be
online soon).
Cheers,
Max
On Wed, Nov 25, 2015 at 5:19 PM, Stephan Ewen wrote:
> Hi Javier!
>
> You can solve this both using windows, or using manual state.
>
> What is better depends a bit on when you want to have
I agree with Stephan.
Reading static files is quite uncommon with the DataStream API. Before We
add such a method, we should add a convenience method for Kafka ;)
But in general, I'm not a big fan of adding too many of these methods
because they pull in so many external classes, which lead to brea
Hi Vasia,
my graph object is the following:
Graph graph =
Graph.fromCollection(edgeList.collect(), env);
The vertex is a POJO not the value. So the problem is how could i store and
retrieve the vertex list?
Thanks,
Stefanos
> On 25 Nov 2015, at 18:16, Vasiliki Kalavri wrote:
>
> Hi Stefa
Community growth starts by talking with those interested in your
project. ApacheCon North America is coming, are you?
We are delighted to announce that the Call For Presentations (CFP) is
now open for ApacheCon North America. You can submit your proposed
sessions at
http://events.linuxfoundation.o
Hi,
I just wanted to let you know that I didn't forget about this!
The BlobManager in 1.0-SNAPSHOT has already a configuration parameter to
use a certain range of ports.
I'm trying to add the same feature for YARN tomorrow.
Sorry for the delay.
On Tue, Nov 10, 2015 at 9:27 PM, Cory Monty
wrote:
Hi Konstantin,
I still didn’t come up with an explanation for the behavior. Could you maybe
send me example code (and example data if it is necessary to reproduce the
problem.)? This would really help me pinpoint the problem.
Cheers,
Aljoscha
> On 17 Nov 2015, at 21:42, Konstantin Knauf
> wrot
Hey,
you can preprocess your data, create the vertices and store them to a file,
like you would store any other Flink DataSet, e.g. with writeAsText.
Then, you can create the graph by reading 2 datasets, like this:
DataSet vertices = env.readTextFile("/path/to/vertices/")... // or
your custom re
+1
LICENSE file looks good in source artifact
NOTICE file looks good in source artifact
Signature file looks good in source artifact
Hash files looks good in source artifact
No 3rd party executables in source artifact
Source compiled
All tests are passed
Run standalone mode test app
- Henry
On M
Hi Aljoscha,
sure, will do. I have neither found a solution. I won't have time to put
a minimal example together before the weekend though.
Cheers,
Konstantin
On 25.11.2015 19:10, Aljoscha Krettek wrote:
> Hi Konstantin,
> I still didn’t come up with an explanation for the behavior. Could you m
Hi,
It works fine using this approach.
Thanks,
Stefanos
> On 25 Nov 2015, at 20:32, Vasiliki Kalavri wrote:
>
> Hey,
>
> you can preprocess your data, create the vertices and store them to a file,
> like you would store any other Flink DataSet, e.g. with writeAsText.
>
> Then, you can crea
Good to know :)
On 25 November 2015 at 21:44, Stefanos Antaris
wrote:
> Hi,
>
> It works fine using this approach.
>
> Thanks,
> Stefanos
>
> On 25 Nov 2015, at 20:32, Vasiliki Kalavri
> wrote:
>
> Hey,
>
> you can preprocess your data, create the vertices and store them to a
> file, like you w
28 matches
Mail list logo