Re: Flink application does not scale as expected, please help!

2018-06-18 Thread Ovidiu-Cristian MARCU
Hi all, Allow me to add some comments/questions on this issue that is very interesting. According to documentation [1] the pipeline example assumes the source is running with the same parallelism as successive map operator and the workflow optimizes to collocate source and map tasks if

Re: parallelism for window operations

2017-01-27 Thread Ovidiu-Cristian MARCU
10:43, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr> wrote: > > Thank you, Fabian! > > It works, what I did and results, as an example for other users: > Total slots occupied are 7 (not sure how to check that Source + Flat Map are > in the same slot, assumed

Re: Monitoring REST API

2016-12-21 Thread Ovidiu-Cristian MARCU
Hi Lydia, I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average over multiple nodes. 1)So for each node you can collect the sar output, and obtain for example: Linux 3.2.0-4-amd64 (parasilo-4.rennes.grid5000.fr) 2016-01-27 _x86_64_(16 CPU) 12:54:09

Re: Parameters to Control Intra-node Parallelism

2016-07-13 Thread Ovidiu-Cristian MARCU
(twice in this case) > than what's suggested in Flink (#slots-per-TM^2 * #TMs * 4, which would be > 12*12*32*4 = 18432). Otherwise, it would throw me the not enough buffers > error. > > Thank you, > Saliya > > > > On Tue, Jul 12, 2016 at 7:39 AM, Ovidiu-Cris

Re: Parameters to Control Intra-node Parallelism

2016-07-12 Thread Ovidiu-Cristian MARCU
Hi, Can you post your configuration parameters (exclude default settings) and cluster description? Best, Ovidiu > On 11 Jul 2016, at 17:49, Saliya Ekanayake wrote: > > Thank you Greg, I'll check if this was the cause for my TMs to disappear. > > On Mon, Jul 11, 2016 at

Re: Optimizations not performed - please confirm

2016-06-29 Thread Ovidiu-Cristian MARCU
hts into what optimizations are done in the > Table API/SQL that will be be released in an updated version in 1.1. > > Cheers, > Aljoscha > > +Timo, Explicitly adding Timo > > On Tue, 28 Jun 2016 at 21:41 Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.f

Optimizations not performed - please confirm

2016-06-28 Thread Ovidiu-Cristian MARCU
Hi, The optimizer internals described in this document [1] are probably not up-to-date. Can you please confirm if this is still valid: “The following optimizations are not performed Join reordering (or operator reordering in general): Joins / Filters / Reducers are not re-ordered in Flink.

Re: Flink Version 1.1

2016-05-18 Thread Ovidiu-Cristian MARCU
Hi We are also very interested on the SQL (SQL on Streaming) future support in the next release (even if it is partial work that works :) ) Thank you! Best, Ovidiu > On 18 May 2016, at 14:42, Stephan Ewen wrote: > > Hi! > > That question is coming up more and more. > I

What / Where / When / How questions in Spark 2.0 ?

2016-05-16 Thread Ovidiu-Cristian MARCU
Hi, We can see in [2] many interesting (and expected!) improvements (promises) like extended SQL support, unified API (DataFrames, DataSets), improved engine (Tungsten relates to ideas from modern compilers and MPP databases - similar to Flink [3]), structured streaming etc. It seems we

Hash tables - joins, cogroup, deltaIteration

2016-04-18 Thread Ovidiu-Cristian MARCU
Hi, Can you please confirm if there is any update regarding the hash tables use cases, as in [1] it is specified that Hash tables are used in Joins and for the Solution set in iterations (pending work to use them for grouping/aggregations)? I am interested in the pending work progress and also

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Ovidiu-Cristian MARCU
Hi, Your assumption may be incorrect related to the TeraSort use case for eastcirclek's implementation. How many time did you run your program? It would be helpful to give more details about your experiment, in terms of configuration, dataset size. Best, Ovidiu > On 14 Apr 2016, at 17:14,

Re: off-heap size feature request

2016-03-19 Thread Ovidiu-Cristian MARCU
> > Hi Ovidiu, > > the parameters to configure the amount of managed memory > (taskmanager.memory.size, taskmanager.memory.fraction) are valid for on and > off-heap memory. > > Have you tried these parameters and didn't they work as expected? > > Best, Fabian >

Re: off-heap size feature request

2016-03-18 Thread Ovidiu-Cristian MARCU
ry. Hence, the overall process size will be roughly > 4GB. The parameter name "taskmanager.heap.mb" is a bit confusing in case of > off-heap memory usage, because it does not define this size of the heap but > of the overall process. > > Hope this helps, >

Not enough free slots to run the job

2016-03-18 Thread Ovidiu-Cristian MARCU
Hi, For the situation where a program specify a maximum parallelism (so it is supposed to use all available task slots) we can have the possibility that one of the task managers is not registered for various reasons. In this case the job will fail for not enough free slots to run the job. For

off-heap size feature request

2016-03-16 Thread Ovidiu-Cristian MARCU
Hi, Is it possible to add a parameter off-heap.size for the task manager off-heap memory [1]? It is not possible to limit the off-heap memory size, at least I found nothing in the documentation. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#managed-memory

Re: Memory ran out PageRank

2016-03-16 Thread Ovidiu-Cristian MARCU
synthetic data set respectively. Can you confirm this? > > The solution set for delta iterations is currently implemented as an > in-memory hash table that works on managed memory segments, but is not > spillable. > > – Ufuk > > On Mon, Mar 14, 2016 at 6:30 PM, Ovidiu-Cristi

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Correction: successfully CC I am running is on top of your friend, Spark :) Best, Ovidiu > On 14 Mar 2016, at 20:38, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr> wrote: > > Yes, largely different. I was expecting for the solution set to be spillable. > This

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
d as an > in-memory hash table that works on managed memory segments, but is not > spillable. > > – Ufuk > > On Mon, Mar 14, 2016 at 6:30 PM, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr> wrote: >> >> This problem is surprising a

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
y further > opinions on that? > > Cheers, > Martin > > > On 14.03.2016 17:55, Ovidiu-Cristian MARCU wrote: >> Thank you for this alternative. >> I don’t understand how the workaround will fix this on systems with limited >> memory and maybe larger gr

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
://mail-archives.apache.org/mod_mbox/flink-user/201508.mbox/%3CCAELUF_ByPAB%2BPXWLemPzRH%3D-awATeSz4sGz4v9TmnvFku3%3Dx3A%40mail.gmail.com%3E > > On 14.03.2016 16:55, Ovidiu-Cristian MARCU wrote: >> Hi, >> >> While running PageRank on a synthetic graph I run into this problem:

Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Hi, While running PageRank on a synthetic graph I run into this problem: Any advice on how should I proceed to overcome this memory issue? IterationHead(Vertex-centric iteration (org.apache.flink.graph.library.PageRank$VertexRankUpdater@7712cae0 |

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
ady > implemented or exist as a PR [1]. So we hope to complete the partial > backtracking soon. > > [1] https://github.com/apache/flink/pull/640 > <https://github.com/apache/flink/pull/640> > > Cheers, > Till > > On Mon, Feb 22, 2016 at 6:00 PM, Ovidiu-C

Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
Hi In case of failure of a node what does it mean 'Fault tolerance for programs in the DataSet API works by retrying failed executions’ [1] ? -work already done by the rest of the nodes is not lost, only work of the lost node is recomputed, job execution will continue or -entire job execution

Apache Flink Web Dashboard - Completed Job history

2015-12-16 Thread Ovidiu-Cristian MARCU
Hi If I restart the Flink I don’t see anymore the history of the completed jobs. Is this a missing feature or what should I do to see the completed job list history? Best regards, Ovidiu

Re: flink connectors

2015-11-27 Thread Ovidiu-Cristian MARCU
Hi, The main question here is why the distribution release doesn’t contain the connector dependencies. It is fair to say that it does not have to (which connector to include or all). So just like Spark does, Flink offers binary distribution for hadoop only without considering other

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
o the resource > elasticity you mentioned. > > Yes, resource elasticity in Flink will mitigate such issues. We would be able > to respond to YARN's preemption requests if jobs with higher priorities are > requesting additional resources. > > On Fri, Nov 20, 2015 at 2:07 PM

Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
Hi, I am currently interested in experimenting on Flink over Hadoop YARN. I am documenting from the documentation we have here: https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/yarn_setup.html

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
gured in YARN. > > In general, we recommend to start a YARN session per program. You can also > directly submit a Flink program to YARN. > > Where did you find the link to the FAQ? The link on the front page is > working: http://flink.apache.org/faq.html <http://flink.apach