I have couple more questions regarding flink's jvm memory.
In a streaming application what is managed memory used for? I read from a
blog that all objects created inside the user function will go into
unmanaged memory. Where does the managed key/ operator state state reside?
Also when does the
Hi There ,
I'm using custom writer with hourly Rolling Bucket sink . I'm seeing two
issue
first one if write the same file on s3 all the files
gets committed , however when I write the same on HDFS I see its remains on
.pending state , could be related to second problem below
Second issue : My
Hi Ted,
I believe HDFS-6584 is more of an HDFS feature supporting archive use case
through some policy configurations.
My ask is that I have two distinct HCFS File systems which are independent but
the Flink job will decide which one to use for sink while the Flink
infrastructure is by default
Hi Team,
I am using the apache flink with java for below problem statement
1.where i will read a csv file with field delimeter character ;
2.transform the fields
3.write back the data back to csv
my doubts are as below
1. if i need to read the csv file of size above 50 gb what would be the
Would HDFS-6584 help with your use case ?
On Wed, Aug 23, 2017 at 11:00 AM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com.invalid> wrote:
> Hello,
> Is it possible for a Flink cluster to use multiple HDFS repository (HDFS-1
> for managing Flink state backend, HDFS-2 for syncing results from
Hello,
Is it possible for a Flink cluster to use multiple HDFS repository (HDFS-1 for
managing Flink state backend, HDFS-2 for syncing results from user job)?
The scenario can be viewed in the context of running some jobs that are meant
to push the results to an archive repository (cold
Hi Steven,
Yes, GC is a big overhead, it may cause your CPU utilization to reach
100%, and every process stopped working. We ran into this a while too.
How much memory did you assign to TaskManager? How much the your CPU
utilization when your taskmanager is considered 'killed'?
Bowen
Till,
Once our job was restarted for some reason (e.g. taskmangaer container got
killed), it can stuck in continuous restart loop for hours. Right now, I
suspect it is caused by GC pause during restart, our job has very high
memory allocation in steady state. High GC pause then caused akka
Hi!
The sink is merely a union of the result of the co-group and the one data
source.
Can't you just make to distinct pipelines out of that? One with co-group ->
data sink pipeline and one with the source->sink pipeline?
They could even be part of the same job...
Best,
Stephan
On Wed, Aug 23,
Hi,
The reason is that there are two (or more) different Threads doing the reading.
As an illustration, consider first this case:
DataSet input = ...
input.map(new MapA()).map(new MapB())
Here, MapB is technically "wrapped" by MapA and when MapA emits data this is
directly going the the map()
You could enable object reuse [0] if you application allows that. Also
adjusting the managed memory size [1] can help.
Are you using Flink's graph library Gelly?
[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html#object-reuse-enabled
[1]
Hi Anugrah,
you can track the progress at the accompanying jira issue:
https://issues.apache.org/jira/browse/FLINK-6916
Currently, roughly half of the tasks are done with a few remaining in PR
review stage. Note that the actual implementation differs a bit from what was
proposed in FLIP 19
Thanks Aljoscha for the prompt response.
Can you explain the technical reason for the single predecessor rule? This
makes what we are trying to do much more expensive. Really what we’re doing is
reading a parquet file, doing several maps/filters on the records and writing
to the parquet. There
Does someone has a current performance test based on PageRank or an idea why
Flink lost the comparison?
> Am 18.08.2017 um 19:51 schrieb Kaepke, Marc :
>
> Hi everyone,
>
> I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark
> or have the
Hey,
I was reading through the flink documentation and found that there were
some plans around stand alone blob server. I found the details here
https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture.
Just wanted to know whether this was done.
regards,
Hi Tony,
Thank you for this thorough explanation. Helps a lot!
Kind Regards,
Tomasz
On 23 August 2017 at 11:30, Tony Wei wrote:
> Hi Tomasz,
>
> Actually, window is not a real operator shared by your operators created by
> reduce() and apply() function.
> Flink
Hi Tomasz,
Actually, window is not a real operator shared by your operators created by
reduce() and apply() function.
Flink implemented WindowOperator by binding window(), trigger() and
evictor() as well with the WindowFunction.
It is more like the prior operator sent elements to two following
Hi Tony,
Won't that increase the amount of processing Flink has to do? It would
have to window twice, right?
Thanks,
Tomasz
On 23 August 2017 at 11:02, Tony Wei wrote:
> Hi Tomasz,
>
> In my opinion, I would move .window() function down to these two DataStream.
>
Hi Tomasz,
In my opinion, I would move .window() function down to these two
DataStream. (rawEvent.window().reduce().map(), and so does metrics)
It makes sure that they won't share the same constructor.
Regards,
Tony Wei
2017-08-23 17:51 GMT+08:00 Tomasz Dobrzycki :
Hi Tomasz,
I think this is because .window() is a lazy operator.
It just creates a WindowedStream class but not create a corresponding
operator.
The operator will be created after you called .reduce() and .apply().
rawEvents and metrics actually shared the same object to create their own
Thanks James for sharing your experience. I find it very interesting :-)
On Tue, Aug 22, 2017 at 9:50 PM, Hao Sun wrote:
> Great suggestions, the etcd operator is very interesting, thanks James.
>
>
> On Tue, Aug 22, 2017, 12:42 James Bucher wrote:
>>
>>
Hi Jerry,
You can learn about Flink's windowing mechanics in this blog (
https://flink.apache.org/news/2015/12/04/Introducing-windows.html).
To my understanging, window() defines how Flink use WindowAssigner to
insert an element to the right windows, trigger() defines when to fire a
window and
Never mind, it was a silly mistake, I used "=" instead of ":" while setting
akka.ask.timeout.
Now it works fine!
On Tue, Aug 22, 2017 at 5:10 PM, Vishnu Viswanath <
vishnu.viswanat...@gmail.com> wrote:
> Hi,
>
> After I submit the job the client timeout after 10 seconds( Guess Job
> manager is
23 matches
Mail list logo