Congrats!
On Fri, Feb 10, 2017 at 9:11 PM, Matthias J. Sax wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Congrats!
>
> On 2/10/17 2:00 AM, Ufuk Celebi wrote:
> > Hey everyone,
> >
> > I'm very happy to announce that the Flink PMC has accepted Stefan
> > Richter to become a commi
I don't have it any more, unfortunately. To be clear, I don't think it was
Flink related, but a collision between a Hadoop security library calling
into a Google Guava library, where a method was missing on CacheBuilder in
the latter. Also, to add to the irritation, it only happened in my OSX
envir
Dean:
Can you pastebin the stack trace around the MethodMissing error ?
If there was no stack trace, please tell us the what the log said.
Thanks
On Fri, Feb 10, 2017 at 2:26 PM, Dean Wampler
wrote:
> This is completely unrelated, but I just debugged a MethodMissing error in
> an application s
This is completely unrelated, but I just debugged a MethodMissing error in
an application stack, where it doesn't occur with Hadoop 2.7.2, but does
occur with 2.7.3 (yeah!). I would dig into the appropriate logs to see if
an underlying exception is being thrown and you're not seeing enough detail.
Does Flink support Hadoop 2.7.3? I installed Flink for HAdoop 2.7.0 but
seeing this error:
2017-02-10 18:59:52,661 INFO
org.apache.flink.yarn.YarnClusterDescriptor - Deployment
took more than 60 seconds. Please check if the requested resources are
available in the YARN cluster
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
Congrats!
On 2/10/17 2:00 AM, Ufuk Celebi wrote:
> Hey everyone,
>
> I'm very happy to announce that the Flink PMC has accepted Stefan
> Richter to become a committer of the Apache Flink project.
>
> Stefan is part of the community for almost a y
Hi Kurt,
Thanks for the reply.
Does this mean that if my job has 3 operators (not chained), it will use at
least 3 slots? I thought parallelism was task based. You can define it at
an operator level, but that only means that the tasks for that operator are
distributed across that many slots.
The docs say that it may improve performance.
How true is it, when custom serializers are provided?
There is also 'disableAutoTypeRegistration' method in the config class,
implying Flink registers types automatically.
So, given that I have an hierarchy:
trait A
class B extends A
class C extends A
Hi,
Flink operators will not always (in fact almost never) run in a single
slot. Mostly the whole parallel sub-slice of a pipeline can run in one
slot, so in your case you get three parallel instances for every operator
in your topology and then one instance of each operator will sit in a slot.
Ch
Hi,
I think there are two (somewhat) orthogonal problems here:
1) Determining when a stream of input data switches from the "reading old
data" to the "reading current data" phase.
2) Blocking/buffering one input of an operator depending on some condition
on the other input.
I think 1. can only
Hi,
so you mean data flowing in arbitrary directions? That's certainly
interesting but I guess it can lead to all sorts of problems in a
distributed system.
Cheers,
Aljoscha
On Wed, 8 Feb 2017 at 01:02 Chen Qin wrote:
> Hi there,
>
> I don't think this would be a urgent topic but definitely see
Hi,
I think distributed stream clustering is still a somewhat open field. I'm
not aware of popular open source systems that have implementations for that
(except maybe Apache SAMOA). Maybe you will have some luck if you try to
search for "distributed stream clustering" papers.
Cheers,
Aljoscha
On
Hi Xingcan,
FLINK-1885 looked into adding a bulk mode to Gelly's iterative models.
As an alternative you could implement your algorithm with Flink operators
and a bulk iteration. Most of the Gelly library is written with native
operators.
Greg
On Fri, Feb 10, 2017 at 5:02 AM, Xingcan Cui wrote
Async snapshotting is the default.
> Am 10.02.2017 um 14:03 schrieb vinay patil :
>
> Hi Stephan,
>
> Thank you for the clarification.
> Yes with RocksDB I don't see Full GC happening, also I am using Flink 1.2.0
> version and I have set the statebackend in flink-conf.yaml file to rocksdb,
>
I'm not sure what exactly is the problem, but could you check this FAQ item?
http://flink.apache.org/faq.html#why-am-i-getting-a-nonserializableexception-
Best,
Gábor
2017-02-10 14:16 GMT+01:00 Sebastian Neef :
> Hi,
>
> thanks! That's exactly what I needed.
>
> I'm not using: DataSetA.leftOute
Hi,
thanks! That's exactly what I needed.
I'm not using: DataSetA.leftOuterJoin(DataSetB).where(new
KeySelector()).equalTo(new KeySelector()).with(new JoinFunction(...)).
Now I get the following error:
> Caused by: org.apache.flink.optimizer.CompilerException: Error translating
> node 'Map "Ke
Hi Stephan,
Thank you for the clarification.
Yes with RocksDB I don't see Full GC happening, also I am using Flink 1.2.0
version and I have set the statebackend in flink-conf.yaml file to rocksdb,
so by default does this do asynchronous checkpointing or I have to specify
it at the job level ?
Re
Hello Sebastian,
You can use DataSet.leftOuterJoin for this.
Best,
Gábor
2017-02-10 12:58 GMT+01:00 Sebastian Neef :
> Hi,
>
> is it possible to assign a "default" value to elements that didn't match?
>
> For example I have the following two datasets:
>
> |DataSetA | DataSetB|
> -
Hi,
is it possible to assign a "default" value to elements that didn't match?
For example I have the following two datasets:
|DataSetA | DataSetB|
-
|id=1 | id=1
|id=2 | id=3
|id=5 | id=4
|id=6 | id=6
When doing a join with:
A.join(B).where( KeySelector(A.id
Hi,
FSStateBackend operates completely on-heap and only snapshots for checkpoints
go against the file system. This is why the backend is typically faster for
small states, but can become problematic for larger states. If your state
exceeds a certain size, you should strongly consider to use Roc
Hi,
I am doing performance test for my pipeline keeping FSStateBackend, I have
observed frequent Full GC's after processing 20M records.
When I did memory analysis using MAT, it showed that the many objects
maintained by Flink state are live.
Flink keeps the state in memory even after checkpoint
Hi Vasia,
b) As I said, when some vertices finished their work in current phase, they
have nothing to do (no value updates, no message received, just like slept)
but to wait for other vertices that have not finished (the current phase)
yet. After that in the next phase, all the vertices should go
Hey everyone,
I'm very happy to announce that the Flink PMC has accepted Stefan
Richter to become a committer of the Apache Flink project.
Stefan is part of the community for almost a year now and worked on
major features of the latest 1.2 release, most notably rescaling and
backwards compatibili
Hi,
Parallelism is actually operator level, and each instance of the operator
will occupy one slot. In some cases, Flink use chaining to chain multi
operators to let them share one single slot, but sometimes it can not be
done. If your job contains multiple operators and some of them cannot be
cha
Hi Xingcan,
On 9 February 2017 at 18:16, Xingcan Cui wrote:
> Hi Vasia,
>
> thanks for your reply. It helped a lot and I got some new ideas.
>
> a) As you said, I did use the getPreviousIterationAggregate() method in
> preSuperstep() of the next superstep.
> However, if the (only?) global (aggre
Tzu-Li (Gordon) Tai wrote
> Stream A has a rate of 100k tuples/s. After processing the whole Kafka
> queue, the rate drops to 10 tuples/s.
Absolutely correct.
Tzu-Li (Gordon) Tai wrote
> So what you are looking for is that flatMap2 for stream B only doing work
> after the job reaches the latest re
26 matches
Mail list logo