[BUG?] Cannot Load User Class on Local Environment

2017-04-24 Thread Matt
Hi all, I'm trying to run Flink using a local environment, but on an Ignite node to achieve collocation (as mentioned in my previous message on this list). Have a look at the code in [1]. It's pretty simple, but I'm getting a "cannot load user class" error as shown in [2]. If you check line #29

Re: Flink on Ignite - Collocation?

2017-04-24 Thread Matt
It seems to me the bottleneck will be the network if I don't collocate Flink jobs, after all Ignite caches are in-memory not on disk (much faster than network). Achieving collocation would be more difficult in Kafka, but it should be relatively easy in Ignite due to its out of the box collocated c

Re: Benchmarking Apache Flink via Query Plan

2017-04-24 Thread Fabian Hueske
Hi Giacomo90, I'm not aware of a detailed description of the execution plan. The plan can be used to identify the execution strategies (shipping and local) chosen by the optimizer and some properties of the data (partitioning, order). Common shipping strategies can be FORWARD (locally forwarding,

Writing a reliable Flink source for a NON-replay-able queue/protocol that supports message ACKs

2017-04-24 Thread Martin Eden
Hi everyone, Are there any examples of how to implement a reliable (zero data loss) Flink source reading from a system that is not replay-able but supports acknowledging messages? Or any pointers of how one can achieve this and how Flink can help? I imagine it should involve a write ahead log bu

Re: Problems reading Parquet input from HDFS

2017-04-24 Thread Flavio Pompermaier
I started from this guide: https://github.com/FelixNeutatz/parquet-flinktacular Best, Flavio On 24 Apr 2017 6:36 pm, "Jörn Franke" wrote: > Why not use a parquet only format? Not sure why you need an > avtoparquetformat. > > On 24. Apr 2017, at 18:19, Lukas Kircher > wrote: > > Hello, > > I a

Re: Grafana plug-in for Flink

2017-04-24 Thread Dawid Wysakowicz
Hi, As far as I am aware Jamie used the example json datasource for grafana https://github.com/grafana/simple-json-datasource . At least I used it when I recreated his example for some introductory purposes. You can browse my example here: https://github.com/dawidwys/flink-intro. Best, Dawid 201

Re: Problems reading Parquet input from HDFS

2017-04-24 Thread Jörn Franke
Why not use a parquet only format? Not sure why you need an avtoparquetformat. > On 24. Apr 2017, at 18:19, Lukas Kircher > wrote: > > Hello, > > I am trying to read Parquet files from HDFS and having problems. I use Avro > for schema. Here is a basic example: > > public static void main(Str

Problems reading Parquet input from HDFS

2017-04-24 Thread Lukas Kircher
Hello, I am trying to read Parquet files from HDFS and having problems. I use Avro for schema. Here is a basic example: public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); Job job = Job.getInstance(); H

Grafana plug-in for Flink

2017-04-24 Thread Andy Gibraltar
Hi, In his presentation “The Stream Processor as a Database”, Jamie Grier mentions a Grafana plugin for Flink. Does anyone know where can I find it. Unfortunately Google search did not yield anything. I’m not sure if he “open-sourced” it. https://image.slidesharecdn.com/jamiegrier-thestreamproc

Regarding exception relating to FlinkKafkaConsumer09

2017-04-24 Thread Abdul Salam Shaikh
Hi, I am using 1.2-SNAPSHOT of Apache Flink and 1.3-SNAPSHOT of flink-connector-kafka-0.9_2.11. It was executing without any errors before but it is giving the following exception at the moment: ​java.lang.NoSuchMethodError: org.apache.flink.util.PropertiesUtil.getBoolean(Ljava/util/Properties;L

Re: in-memory optimization

2017-04-24 Thread Ufuk Celebi
Loop invariant data should be kept in Flink's managed memory in serialized form (in a custom hash table). That means that they are not read back again from the CSV file, but they are kept in serialized form and need be deserialized again on access. CC'ing Fabian to double check... On Mon, Apr 24,

Re: Queryable State

2017-04-24 Thread Chet Masterson
Ufuk - thank you for your help. My flink-conf.yaml is now configured cluster-wide (with restart) as: # my flink-conf.yaml, on all flink nodes:jobmanager.rpc.address: x.x.x.xjobmanager.rpc.port: 6123query.server.port: 6123query.server.enable: true When I try to issue my query now with the above sett

in-memory optimization

2017-04-24 Thread Robert Schwarzenberg
Hello, I have a question regarding the loop-awareness of Flink wrt invariant datasets. Does Flink serialize the DataSet 'points' in line 85 https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala

Queryable State

2017-04-24 Thread Chet Masterson
I moved up from running queryable state on a standalone Flink instance to a several node cluster. My queries don't seem to be responding when I execute them on the cluster. A few questions: 1. The error I am getting:WARN [ReliableDeliverySupervisor] Association with remote system [akka.tcp://flink@

Re: Flink on Ignite - Collocation?

2017-04-24 Thread Ufuk Celebi
Hey Matt, in general, Flink doesn't put too much work in co-locating sources (doesn't happen for Kafka, etc. either). I think the only local assignments happen in the DataSet API for files in HDFS. Often this is of limited help anyways. Your approach sounds like it could work, but I would general

Re: Disk I/O in Flink

2017-04-24 Thread Ufuk Celebi
Hey Robert, for batch that should cover the relevant spilling code. If the records are >= 5 MB, the SpillingAdaptiveSpanningRecordDeserializer will spill incoming records as well. But that should be covered by the FileChannel instrumentation as well? – Ufuk On Tue, Apr 18, 2017 at 3:57 PM, Robe

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Hi Ufuk, The ProcessFunction receives elements and buffers them into a MapState, and periodically (for example every x seconds) register processing time timers (according to some rules which it gets from a connected rule stream). When a timer fires, I pop next element from state, request an extern

Re: Queryable State

2017-04-24 Thread Ufuk Celebi
You should be able to use queryable state w/o any changes to the default config. The `query.server.port` option defines the port of the queryable state server that serves the state on the task managers and it is enabled by default. The important thing is to configure the client to discover the Job

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Ufuk Celebi
@Yessine: no, there is no way to disable the back pressure mechanism. Do you have more details about the two last operators? What do you mean with the process function is slow on purpose? @Rune: with 1.3 Flink will configure the internal buffers in a way that not too much data is buffered in the i

Flink on Ignite - Collocation?

2017-04-24 Thread Matt
Hi all, I've been playing around with Apache Ignite and I want to run Flink on top of it but there's something I'm not getting. Ignite has its own support for clustering, and data is distributed on different nodes using a partitioned key. Then, we are able to run a closure and do some computation

Re: Flink Kafka Consumer Behaviour

2017-04-24 Thread Stephan Ewen
The latest Flink has consumers for Kafka 0.8, 0.9, 0.10 - which one are you using? I would assume you use Flink with Kafka 0.8.x, because as far as I know, starting from Kafka 0.9, offsets are not handled by ZooKeeper any more... On Mon, Apr 24, 2017 at 12:18 AM, Meghashyam Sandeep V < vr1meghash

Re: Why TimerService interface in ProcessFunction doesn't have deleteEventTimeTimer

2017-04-24 Thread Stephan Ewen
Here are the thoughts why I advocated to not expose the "delete" initially: (1) The original heap timer structure was not very sophisticated and could not support efficient timer deletion (as Gyula indicated). As a core rule in large scale systems: Never expose an operation you cannot do efficie

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Rune Skou Larsen
Sorry I cant help you, but we're also experiencing slow checkpointing, when having backpressure from sink. I tried HDFS, S3, and RocksDB state backends, but to no avail - checkpointing always times out with backpressure. Can we somehow reduce Flink's internal buffer sizes, so checkpointing w

Re: flink testing

2017-04-24 Thread Fabian Hueske
Another option might be Flink Spector [1]. Cheers, Fabian [1] https://github.com/ottogroup/flink-spector 2017-04-23 20:01 GMT+02:00 Georg Heiler : > Thanks > Konstantin Knauf schrieb am So. 23. Apr. > 2017 um 19:39: > >> + user@flink (somehow I didn't reply to the list) >> >> Hi Georg, >> >> t

Re: flink testing

2017-04-24 Thread Georg Heiler
Thanks Konstantin Knauf schrieb am So. 23. Apr. 2017 um 19:39: > + user@flink (somehow I didn't reply to the list) > > Hi Georg, > > there are two classes "MultipleProgramsTestBase" and > "StreamingMultipeProgramsTestBase", which your integration tests can > extend from. > > This will spin up a l

Re: Flink first project

2017-04-24 Thread Georg Heiler
Wouldn't adding flume -> Kafka -> flink also introduce additional latency? Georg Heiler schrieb am So., 23. Apr. 2017 um 20:23 Uhr: > So you would suggest flume over a custom akka-source from bahir? > > Jörn Franke schrieb am So., 23. Apr. 2017 um > 18:59 Uhr: > >> I would use flume to import t

Re: Flink Kafka Consumer Behaviour

2017-04-24 Thread Meghashyam Sandeep V
Hi All, Sorry for the miscommunication. I'm not using 0.8. I'm using latest available flink-kafka client. I don't see my app registered as a consumer group. I wanted to know if there is a way to monitor Kafka offsets. Thanks, Sandeep On Apr 23, 2017 9:38 AM, "Stephan Ewen" wrote: > Since it is

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine Marzougui
Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements a

Re: Flink first project

2017-04-24 Thread Georg Heiler
So you would suggest flume over a custom akka-source from bahir? Jörn Franke schrieb am So., 23. Apr. 2017 um 18:59 Uhr: > I would use flume to import these sources to HDFS and then use flink or > Hadoop or whatever to process them. While it is possible to do it in flink, > you do not want that

Fwd: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
-- Forwarded message -- From: "Yassine MARZOUGUI" Date: Apr 23, 2017 20:53 Subject: Checkpoints very slow with high backpressure To: Cc: Hi all, I have a Streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Im sorry guys if you received multiple instances of this mail, I kept trying to send it yesterday, but looks like the mailing list was stuck and didn't dispatch it until now. Sorry for the disturb. On Apr 23, 2017 20:53, "Yassine MARZOUGUI" wrote: > Hi all, > > I have a Streaming pipeline as fol

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements a

Re: Flink Checkpoint runs slow for low load stream

2017-04-24 Thread Yassine MARZOUGUI
Hi all, I am experiencing a similar problem but with HDFS as a source instead of Kafka. I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future ac

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Re-sending as it looks like the previous mail wasn't correctly sent --- Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI