Flinkheads,
I'm processing from a Kafka source, using event time with watermarks based
on a threshold, and using tumbling time windows to perform some rollup. My
sink is idempotent, and I want to ensure exactly-once processing end-to-end.
I am trying to figure out if I can stick with memory check
Hi Gna,
Thanks for reporting the problem. Because level 1 operation in FlinkML BLAS
library doesn’t support SparseVector, SparseVector is not supported currently.
I’ve filed this to JIRA [1].
Maybe I can send a patch to solve this in few days.
[1]: https://issues.apache.org/jira/browse/FLINK-3
Hello Aljoscha ,
I have checked again with the (fantastic) blog here:
https://flink.apache.org/news/2015/12/04/Introducing-windows.html
and I have come to understand that the contents of a window-buffer must be
disposed of *only* after the User-defined evaluation function has
seen and used them al
Hello Aljoscha ,
Many thanks for the explanation. Referring to the flow from your response:
---
1. Trigger fires
2. Evictor is called if it exists
3. Elements are evicted from window buffer if evictor returned number > 0
4. User-provide
Hi Anastasiia,
this is difficult because the input is usually read in parallel, i.e., an
input file is split into several blogs which are independently read and
processed by different threads (possibly on different machines). So it is
difficult to have a sequential row number.
If all rows have th
Is there a way to get the current line number (or generally the number of
element currently being processed) inside a mapper?
The example is a matrix you read line-line by line from the file and need both
the row and the column numbers. Column number is easy to get, but how to know
the row numb
Hello,
Thank you very much. This was indeed the problem. The firewall was blocking
6123 and 43008. Also the user did not have permissions to unblock the
firewall.
Retried the following command with root privileges : ufw allow port
and this made the job run.
Kind Regards,
Ravinder Kaur
On Wed,
Can machines connect to port 6123? The firewall may block that port, put
permit SSH.
On Wed, Feb 3, 2016 at 9:52 PM, Ravinder Kaur wrote:
> Hello,
>
> Here is the log file of Jobmanager. I did not see some thing suspicious
> and as it suggests the ports are also listening.
>
> 20:58:46,906 INFO
Hello,
Here is the log file of Jobmanager. I did not see some thing suspicious and
as it suggests the ports are also listening.
20:58:46,906 INFO org.apache.flink.runtime.jobmanager.JobManager
- Starting JobManager on IP-of-master:6123 with execution mode CLUSTER
and streaming mode BATCH_ON
Hello,
I also feel like it is something to do with network configuration. But then
I have checked all these pre-requisites that you have mentioned.
1. the hostnames are right. ("master-IP" or "hostname-of-master" is just
my edit to make things clear. its not the real value).
2. the mach
All:
I'm trying to use SparseVectors with FlinkML 0.10.1. It does not seem to
be working. Here is a UnitTest that I created to recreate the problem:
*package* com.aol.ds.arc.ml.poc.flink
> *import* org.junit.After
> *import* org.junit.Before
> *import* org.slf4j.LoggerFactory
> *import* org.
There still seems to be something wrong with your network config.
This looks not like a Flink problem and needs work on your end, we cannot
debug that for you.
Please go through your network setup and check for example
- if the hostnames are right (is "master-IP" really the name of the
network
Hi,
the TaskManager is starting up, but its not able to register at the job
manager. Did you check the JobManager log? Do you see anything suspicious
there? Are the ports matching?
On Wed, Feb 3, 2016 at 9:23 PM, Ravinder Kaur wrote:
> Hello,
>
> Thank you for pointing it out. I had a little t
Hello,
Thank you for pointing it out. I had a little typo while I edited the
hostname in flink-conf.yaml. I've reset it and the TaskManager started up.
But I still can't run the WordCount example and it throws the same
NoResourceAvaliableException.
Caused by:
org.apache.flink.runtime.jobmanager.s
This looks like the reason:
java.net.UnknownHostException: Cannot resolve the JobManager hostname
'hostname-of-master' specified in the configuration
On Wed, Feb 3, 2016 at 7:29 PM, Ravinder Kaur wrote:
> Hello,
>
> The log file of the Taskmanager now shows the following
>
> 18:27:10,082 WARN
Hello,
The log file of the Taskmanager now shows the following
18:27:10,082 WARN org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManag
What do the TaskManger logs say?
On Wed, Feb 3, 2016 at 6:34 PM, Ravinder Kaur wrote:
> Hello,
>
> Thanks for the quick reply. I tried to set jobmanager.rpc.address in
> flink-conf.yaml to the hostname of master node on both the nodes.
>
> Now it does not start the Taskmanager at the worker node
Hello,
Thanks for the quick reply. I tried to set jobmanager.rpc.address in
flink-conf.yaml to the hostname of master node on both the nodes.
Now it does not start the Taskmanager at the worker node at all. When I
start the cluster using ./bin/start-cluster.sh on master it shows the
normal output
Hello,
I am facing an error which for which I cannot figure the cause. Any idea what
could cause such an error?
java.lang.Exception: The slot in which the task was executed has been released.
Probably loss of TaskManager a8b69bd9449ee6792e869a9ff9e843e2 @ cloudr6-admin -
4 slots - URL: akka
Hi Gwenhäel,
if you set the number of slots for each TaskManager to 4, then all of your
mapper will be evenly spread out. The sources should also be evenly spread
out. However, for the sinks since they depend on all mappers, it will be
most likely random where they are deployed. So you might end u
Looks like the network configuration is not correct.
I would try setting the full host name (like "master.abc.xyz.com") as
jobmanager.rpc.address.
Greetings,
Stephan
On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur wrote:
>
> Hello Community,
>
> I'm a student and new to Apache Flink. I'm trying
Hello Community,
I'm a student and new to Apache Flink. I'm trying to learn and have setup a
2- node standalone Flink(0.10.1) cluster (one master and one worker). I'm
facing the following issue.
Cluster: consists of 2 vms (one master and one worker)
The configurations are done as per
https://ci.
It is one type of mapper with a parallelism of 16
It's the same for the sinks and sources (parallelism of 4)
The settings are
Env.setParallelism(4)
Mapper.setPrallelism(env.getParallelism() * 4)
We mean to have X mapper tasks per source / sink
The mapper is doing some heavy computation and we h
Hi Gwenhäel,
when you say 16 maps, are we talking about one mapper with parallelism 16 or 16
unique map operators?
Regards,
Aljoscha
> On 03 Feb 2016, at 15:48, Gwenhael Pasquiers
> wrote:
>
> Hi,
>
> We try to deploy an application with the following “architecture” :
>
> 4 kafka sources =
Hi,
We try to deploy an application with the following “architecture” :
4 kafka sources => 16 maps => 4 kafka sinks, on 4 nodes, with 24 slots (we
disabled operator chaining).
So we’d like on each node :
1x source => 4x map => 1x sink
That way there are no exchanges between different instances
I've fixed it changing the copy method in the *TupleSerializer* as follow:
@Override
public T copy(T from, T reuse) {
for (int i = 0; i < arity; i++) {
Object copy = fieldSerializers[i].copy(from.getField(i));
reuse.setField(copy, i);
}
return reuse;
}
And commenting line 50 in *CollectionExecuti
Exactly, I have more than 4 keys because the "nenative modulo", after
thange this line from
*.keyBy(mappedPayload => mappedPayload._1.id.hashcode % parallelism)*
to
*.keyBy(mappedPayload => Math.abs(mappedPayload._1.id.hashcode % parallelism))*
*or just profit Flink's dataStream.partitio
I've checked the compiled classes with javap -verbose and indeed they had a
major.verion=51 (java 7).
So I've changed the source and target to 1.8 in the main pom.xm and now the
generated .class have major.verion=52.
Unfortunately now I get this error:
[ERROR]
/opt/flink-src/flink-java/src/main/ja
Glad to hear that!
We will release Flink 0.10.2( based on the release-0.10 branch) soon.
Best, Fabian
2016-02-03 14:49 GMT+01:00 LINZ, Arnaud :
> Hi,
>
> Yes, I’m always a bit reluctant before installing a snapshot version « for
> everyone », and I was hoping it would suffice…
>
> However, I’ve
Hi to all,
I was trying to make my Java 8 application to run on a Flink 0.10.1 cluster.
I've compiled both Flink sources and my app with the same Java version
(1.8.72) and I've set the env.java.home to point to my java 8 JVM in every
flink-conf.yml of the cluster.
I always get the following Excep
How long did you run the job? Could it be an artifact of the timing and it
hasn’t yet averaged out.
> On 03 Feb 2016, at 14:32, Aljoscha Krettek wrote:
>
> There should be 4 windows because there are only 4 distinct keys, if I
> understand this line correctly:
>
> .keyBy(mappedPayload => mappe
There should be 4 windows because there are only 4 distinct keys, if I
understand this line correctly:
.keyBy(mappedPayload => mappedPayload._1.id.hashcode % parallelism)
> On 02 Feb 2016, at 19:31, yutao sun wrote:
>
> .keyBy(mappedPayload => mappedPayload._1.id.hashcode % parallelism)
Do you have 7 distinct keys? You get as many result tuples as you have
keys, because the window is per key.
On Wed, Feb 3, 2016 at 12:12 PM, yutao sun wrote:
> Thanks for your help, I retest by disable the object reuse and got the
> same result (please see the picture attached).
>
>
>
>
>
>
Hi Nirmalya,
the result of Evictor.evict() is used internally by the window operator. The
flow is as follows:
1. Trigger fires
2. Evictor is called if it exists
3. Elements are evicted from window buffer if evictor returned number > 0
4. User-provided window function is called to emit window resul
Hello Till ,
>From your prompt reply:
'... the CountTrigger *always* works together with the CountEvictor which
will make sure that only .. ' - that explains it. Thanks. I missed it.
A related question I have is this:
Between the PURGE facility of Trigger and REMOVAL facility of Evictor, is
th
You can have timestamps that are very much out-of-order (in the future,
specifically). The window operator assigns them to the specific window. The
window operators can hold many windows concurrently, which are in progress
at the same time.
Windows are then flushed once the triggers fire (after a
Thanks for your help, I retest by disable the object reuse and got the
same result (please see the picture attached).
2016-02-03 10:51 GMT+01:00 Stephan Ewen :
> The definition looks correct.
> Because the windows are by-key, you should get one window result per key
> per second.
>
> Can y
Hi,
with TPS you mean tuples-per-second? I have an open pull request that changes
the WindowOperator to work on a partitioned state abstraction. In the pull
request I also add a state backend that uses RocksDB, so it it possible.
The size of the windows you can keep also depends on the window fu
Allow me to jump to this very interesting discussion.
The 2nd point is actually an interesting question.
I understand that we can set a timestamp of event in Flink. What if we set
the timestamp to somewhere in the future, for example 24 hours from now ?
Can Flink handle this case ?
Also , I'm s
The definition looks correct.
Because the windows are by-key, you should get one window result per key
per second.
Can you turn off object-reuse? That is a pretty experimental thing and
works with the batch operations quite well, but not so much with the
streaming windows, yet.
I would only enable
Hi Arnauld,
in a previous mail you said:
"Note that I did not rebuild & reinstall flink, I just used a 0.10-SNAPSHOT
compiled jar submitted as a batch job using the "0.10.0" flink installation"
This will not fix the Netty version error. You need to install a new Flink
version or submit the Flink
Hi Nirmalya,
the CountTrigger always works together with the CountEvictor which will
make sure that only count elements are kept in the window. Evictors can
evict elements from the window after the trigger event. That is the reason
why the CountTrigger does not have to purge the window explicitly.
Hi!
I think the closed channel is actually an effect of the process kill.
Before the exception, you can see "15:22:47,592 ERROR org.apache.flink.yarn.
YarnTaskManagerRunner - RECEIVED SIGNAL 15: SIGTERM" in the log, which
means that UNIX is killing the process.
I assume that the first thing that h
Hi Flavio,
we use tags to identify releases. The "release-0.10.1" tag, refers to the
code that has been released as Flink 0.10.1.
The "release-0.10" branch is used to develop 0.10 releases. Currently, it
contains Flink 0.10.1 and additionally a few more bug fix commits. We will
fork off this branc
Is it sufficient to do *git checkout tags/release-0.10.0 *and the compile
it?
However I think it's worth to mention this in the build section of the
documentation..
Best,
Flavio
On Wed, Feb 3, 2016 at 9:57 AM, Flavio Pompermaier
wrote:
> Hi to all,
>
> I wanted to update my Flink cluster insta
Hi,
1) At the moment, state is kept on the JVM heap in a regular HashMap.
However, we added an interface for pluggable state backends. State backends
store the operator state (Flink's built-in window operators are based on
operator state as well). A pull request to add a RocksDB backend (going to
Hi to all,
I wanted to update my Flink cluster installation to Flink 0.10.1 but I
can't find the respective branch.
In the past, I used to go in the apache-flink git folder, exec a *git pull *and
a *git branches -a* in order to *checkout* on the proper release branch.
However I saw that there's on
47 matches
Mail list logo