Hi all,
I'm trying to run Flink using a local environment, but on an Ignite node to
achieve collocation (as mentioned in my previous message on this list).
Have a look at the code in [1]. It's pretty simple, but I'm getting a
"cannot load user class" error as shown in [2].
If you check line #29
It seems to me the bottleneck will be the network if I don't collocate
Flink jobs, after all Ignite caches are in-memory not on disk (much faster
than network).
Achieving collocation would be more difficult in Kafka, but it should be
relatively easy in Ignite due to its out of the box collocated c
Hi Giacomo90,
I'm not aware of a detailed description of the execution plan.
The plan can be used to identify the execution strategies (shipping and
local) chosen by the optimizer and some properties of the data
(partitioning, order).
Common shipping strategies can be FORWARD (locally forwarding,
Hi everyone,
Are there any examples of how to implement a reliable (zero data loss)
Flink source reading from a system that is not replay-able but supports
acknowledging messages?
Or any pointers of how one can achieve this and how Flink can help?
I imagine it should involve a write ahead log bu
I started from this guide:
https://github.com/FelixNeutatz/parquet-flinktacular
Best,
Flavio
On 24 Apr 2017 6:36 pm, "Jörn Franke" wrote:
> Why not use a parquet only format? Not sure why you need an
> avtoparquetformat.
>
> On 24. Apr 2017, at 18:19, Lukas Kircher
> wrote:
>
> Hello,
>
> I a
Hi,
As far as I am aware Jamie used the example json datasource for grafana
https://github.com/grafana/simple-json-datasource .
At least I used it when I recreated his example for some introductory
purposes. You can browse my example here:
https://github.com/dawidwys/flink-intro.
Best,
Dawid
201
Why not use a parquet only format? Not sure why you need an avtoparquetformat.
> On 24. Apr 2017, at 18:19, Lukas Kircher
> wrote:
>
> Hello,
>
> I am trying to read Parquet files from HDFS and having problems. I use Avro
> for schema. Here is a basic example:
>
> public static void main(Str
Hello,
I am trying to read Parquet files from HDFS and having problems. I use Avro for
schema. Here is a basic example:
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
Job job = Job.getInstance();
H
Hi,
In his presentation “The Stream Processor as a Database”, Jamie Grier mentions
a Grafana plugin for Flink. Does anyone know where can I find it. Unfortunately
Google search did not yield anything. I’m not sure if he “open-sourced” it.
https://image.slidesharecdn.com/jamiegrier-thestreamproc
Hi,
I am using 1.2-SNAPSHOT of Apache Flink and 1.3-SNAPSHOT
of flink-connector-kafka-0.9_2.11.
It was executing without any errors before but it is giving the following
exception at the moment:
java.lang.NoSuchMethodError:
org.apache.flink.util.PropertiesUtil.getBoolean(Ljava/util/Properties;L
Loop invariant data should be kept in Flink's managed memory in
serialized form (in a custom hash table). That means that they are not
read back again from the CSV file, but they are kept in serialized
form and need be deserialized again on access.
CC'ing Fabian to double check...
On Mon, Apr 24,
Ufuk - thank you for your help. My flink-conf.yaml is now configured cluster-wide (with restart) as: # my flink-conf.yaml, on all flink nodes:jobmanager.rpc.address: x.x.x.xjobmanager.rpc.port: 6123query.server.port: 6123query.server.enable: true When I try to issue my query now with the above sett
Hello,
I have a question regarding the loop-awareness of Flink wrt invariant
datasets.
Does Flink serialize the DataSet 'points' in line 85
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala
I moved up from running queryable state on a standalone Flink instance to a several node cluster. My queries don't seem to be responding when I execute them on the cluster. A few questions: 1. The error I am getting:WARN [ReliableDeliverySupervisor] Association with remote system [akka.tcp://flink@
Hey Matt,
in general, Flink doesn't put too much work in co-locating sources
(doesn't happen for Kafka, etc. either). I think the only local
assignments happen in the DataSet API for files in HDFS.
Often this is of limited help anyways. Your approach sounds like it
could work, but I would general
Hey Robert,
for batch that should cover the relevant spilling code. If the records
are >= 5 MB, the SpillingAdaptiveSpanningRecordDeserializer will spill
incoming records as well. But that should be covered by the
FileChannel instrumentation as well?
– Ufuk
On Tue, Apr 18, 2017 at 3:57 PM, Robe
Hi Ufuk,
The ProcessFunction receives elements and buffers them into a MapState, and
periodically (for example every x seconds) register processing time timers
(according to some rules which it gets from a connected rule stream). When
a timer fires, I pop next element from state, request an extern
You should be able to use queryable state w/o any changes to the
default config. The `query.server.port` option defines the port of the
queryable state server that serves the state on the task managers and
it is enabled by default.
The important thing is to configure the client to discover the
Job
@Yessine: no, there is no way to disable the back pressure mechanism. Do
you have more details about the two last operators? What do you mean with
the process function is slow on purpose?
@Rune: with 1.3 Flink will configure the internal buffers in a way that not
too much data is buffered in the i
Hi all,
I've been playing around with Apache Ignite and I want to run Flink on top
of it but there's something I'm not getting.
Ignite has its own support for clustering, and data is distributed on
different nodes using a partitioned key. Then, we are able to run a closure
and do some computation
The latest Flink has consumers for Kafka 0.8, 0.9, 0.10 - which one are you
using?
I would assume you use Flink with Kafka 0.8.x, because as far as I know,
starting from Kafka 0.9, offsets are not handled by ZooKeeper any more...
On Mon, Apr 24, 2017 at 12:18 AM, Meghashyam Sandeep V <
vr1meghash
Here are the thoughts why I advocated to not expose the "delete" initially:
(1) The original heap timer structure was not very sophisticated and
could not support efficient timer deletion (as Gyula indicated). As a core
rule in large scale systems: Never expose an operation you cannot do
efficie
Sorry I cant help you, but we're also experiencing slow checkpointing,
when having backpressure from sink.
I tried HDFS, S3, and RocksDB state backends, but to no avail -
checkpointing always times out with backpressure.
Can we somehow reduce Flink's internal buffer sizes, so checkpointing
w
Another option might be Flink Spector [1].
Cheers, Fabian
[1] https://github.com/ottogroup/flink-spector
2017-04-23 20:01 GMT+02:00 Georg Heiler :
> Thanks
> Konstantin Knauf schrieb am So. 23. Apr.
> 2017 um 19:39:
>
>> + user@flink (somehow I didn't reply to the list)
>>
>> Hi Georg,
>>
>> t
Thanks
Konstantin Knauf schrieb am So. 23. Apr.
2017 um 19:39:
> + user@flink (somehow I didn't reply to the list)
>
> Hi Georg,
>
> there are two classes "MultipleProgramsTestBase" and
> "StreamingMultipeProgramsTestBase", which your integration tests can
> extend from.
>
> This will spin up a l
Wouldn't adding flume -> Kafka -> flink also introduce additional latency?
Georg Heiler schrieb am So., 23. Apr. 2017 um
20:23 Uhr:
> So you would suggest flume over a custom akka-source from bahir?
>
> Jörn Franke schrieb am So., 23. Apr. 2017 um
> 18:59 Uhr:
>
>> I would use flume to import t
Hi All,
Sorry for the miscommunication. I'm not using 0.8. I'm using latest
available flink-kafka client. I don't see my app registered as a consumer
group. I wanted to know if there is a way to monitor Kafka offsets.
Thanks,
Sandeep
On Apr 23, 2017 9:38 AM, "Stephan Ewen" wrote:
> Since it is
Hi all,
I have a streaming pipeline as follows:
1 - read a folder continuousely from HDFS
2 - filter duplicates (using keyby(x->x) and keeping a state per key
indicating whether its is seen)
3 - schedule some future actions on the stream using ProcessFunction and
processing time timers (elements a
So you would suggest flume over a custom akka-source from bahir?
Jörn Franke schrieb am So., 23. Apr. 2017 um
18:59 Uhr:
> I would use flume to import these sources to HDFS and then use flink or
> Hadoop or whatever to process them. While it is possible to do it in flink,
> you do not want that
-- Forwarded message --
From: "Yassine MARZOUGUI"
Date: Apr 23, 2017 20:53
Subject: Checkpoints very slow with high backpressure
To:
Cc:
Hi all,
I have a Streaming pipeline as follows:
1 - read a folder continuousely from HDFS
2 - filter duplicates (using keyby(x->x) and keeping
Im sorry guys if you received multiple instances of this mail, I kept
trying to send it yesterday, but looks like the mailing list was stuck and
didn't dispatch it until now. Sorry for the disturb.
On Apr 23, 2017 20:53, "Yassine MARZOUGUI"
wrote:
> Hi all,
>
> I have a Streaming pipeline as fol
Hi all,
I have a streaming pipeline as follows:
1 - read a folder continuousely from HDFS
2 - filter duplicates (using keyby(x->x) and keeping a state per key
indicating whether its is seen)
3 - schedule some future actions on the stream using ProcessFunction and
processing time timers (elements a
Hi all,
I am experiencing a similar problem but with HDFS as a source instead of
Kafka. I have a streaming pipeline as follows:
1 - read a folder continuousely from HDFS
2 - filter duplicates (using keyby(x->x) and keeping a state per key
indicating whether its is seen)
3 - schedule some future ac
Re-sending as it looks like the previous mail wasn't correctly sent
---
Hi all,
I have a streaming pipeline as follows:
1 - read a folder continuousely from HDFS
2 - filter duplicates (using keyby(x->x) and keeping a state per key
indicating whether its is seen)
3
35 matches
Mail list logo