Re: Who's hiring, December 2016

2016-12-16 Thread Dominik Bruhn
Relayr, a Berlin based IoT company is using Apache Flink for processing its sensor data. We are still in the learning phase, but are commited to using Flink. Check out our public job ads for Berlin and Munich: https://relayr.io/jobs/ If you are interested, even if there is no clearly matching

Re: How do I use values from a data stream to create a new streaming data source

2016-12-16 Thread Meghashyam Sandeep V
Runtime error is because you have non-serializable code in your 'map' operator. DataStream stockPrices = streamExecEnv.addSource(new LookupStockPrice(stockSymbol)); stockPrices.print(); The approach that you took will create infinite stockprice sources inside the

How do I use values from a data stream to create a new streaming data source

2016-12-16 Thread hnadathur
I'm trying to build a sample application using Flink that does the following: 1. Reads a stream of stock symbols (e.g. 'CSCO', 'FB') from a Kafka queue 2. For each symbol performs a real-time lookup of current prices and streams the values The program compiles fine but I get the following

state size in relation to cluster size and processing speed

2016-12-16 Thread Seth Wiesman
Hi, I’ve noticed something peculiar about the relationship between state size and cluster size and was wondering if anyone here knows of the reason. I am running a job with 1 hour tumbling event time windows which have an allowed lateness of 7 days. When I run on a 20-node cluster with FsState

Re: Who's hiring, December 2016

2016-12-16 Thread dan bress
The team I work on at Twitter is hiring a Software Engineer: https://careers.twitter.com/en/work-for-twitter/software-engineer-data-products.html Dan On Fri, Dec 16, 2016 at 5:46 AM Kostas Tzoumas wrote: > Hi folks, > > As promised, here is the first thread for

Re: High virtual memory usage

2016-12-16 Thread Stephan Ewen
Also, can you tell us what OS you are running on? On Fri, Dec 16, 2016 at 6:23 PM, Stephan Ewen wrote: > Hi! > > To diagnose this a little better, can you help us with the following info: > > - Are you using RocksDB? > - What is your flink configuration, especially around

Re: High virtual memory usage

2016-12-16 Thread Stephan Ewen
Hi! To diagnose this a little better, can you help us with the following info: - Are you using RocksDB? - What is your flink configuration, especially around memory settings? - What do you use for TaskManager heap size? Any manual value, or do you let Flink/Yarn set it automatically based

Re: How to analyze space usage of Flink algorithms

2016-12-16 Thread Fabian Hueske
The system metrics [1] are only available on a system level, i.e. not for an individual job. The reason is that multiple job might run concurrently on the same task manager JVM process. So it would not be possible to separate their heap usage. The same would be true for the approach that monitors

Flink rolling upgrade support

2016-12-16 Thread Andrew Hoblitzell
Hi. Does Apache Flink currently have support for zero down time or the = ability to do rolling upgrades? If so, what are concerns to watch for and what best practices might = exist? Are there version management and data inconsistency issues to = watch for?=

High virtual memory usage

2016-12-16 Thread Paulo Cezar
Hi Folks, I'm running Flink (1.2-SNAPSHOT nightly) on YARN (Hadoop 2.7.2). A few hours after I start a streaming job (built using kafka connect 0.10_2.11) it gets killed seemingly for no reason. After inspecting the logs my best guess is that YARN is killing containers due to high virtual memory

Re: benchmarking flink streaming

2016-12-16 Thread Meghashyam Sandeep V
Hi Stephan, Thanks for your answer. Is there a way to get the metrics such as latency of each message in the stream? For eg. I have a Kafka source, Cassandra sink and I do some processing in between. I would like to know how long does it take for each message from the beginning(entering flink

Re: benchmarking flink streaming

2016-12-16 Thread Stephan Ewen
Hi! I am not sure there exists a recommended benchmarking tool. Performance comparisons depend heavily on the scenarios you are looking at: Simple event processing, shuffles (grouping aggregation), joins, small state, large state, etc... As fas as I know, most people try to write a "mock"

Re: Question about expired checkpoints

2016-12-16 Thread Stephan Ewen
Hi Nick! In general, checkpoints cannot overtake each other. It can happen (in the presence of failure/recovery) that a checkpoint is "half complete" and subsumed by a newer complete checkpoint. The message "Checkpoint 17 expired before completing" might be correct - you could check the start

Question about expired checkpoints

2016-12-16 Thread Nick Tinnemeier
Hi all, I am currently playing around with checkpoints to better understand how they work. I have some questions I hope you can answer. I am running a simple topology with a source, a map and a sink that writes the events it receives to a HBase table. The parallelism of the environment is set

Re: Continuous File monitoring not reading nested files

2016-12-16 Thread Yassine MARZOUGUI
Looks like this is not specific to the continuous file monitoring, I'm having the same issue (files in nested directories are not read) when using: env.readFile(fileInputFormat, "hdfs:///shared/mydir", FileProcessingMode.PROCESS_ONCE, -1L) 2016-12-16 11:12 GMT+01:00 Yassine MARZOUGUI

Who's hiring, December 2016

2016-12-16 Thread Kostas Tzoumas
Hi folks, As promised, here is the first thread for Flink-related job positions. If your organization is hiring people on Flink-related positions do reply to this thread with a link for applications. data Artisans is hiring on multiple technical positions. Help us build Flink, and help our

Re: Updating a Tumbling Window every second?

2016-12-16 Thread Matt
I have reduced the problem to a simple image [1]. Those shown on the image are the streams I have, and the problem now is how to create a custom window assigner such that objects in B that *don't share* elements in A, are put together in the same window. Why? Because in order to create elements

Re: Flink 1.1.3 web UI is loading very slowly

2016-12-16 Thread Ufuk Celebi
If it is not what Stephan said, you can check requests directly against the REST API and see if you experience the same slow down there. https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/rest_api.html – Ufuk On 16 December 2016 at 10:52:18, Yury Ruchin

Continuous File monitoring not reading nested files

2016-12-16 Thread Yassine MARZOUGUI
Hi all, I'm using the following code to continuously process files from a directory "mydir". final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FileInputFormat fileInputFormat = new TextInputFormat(new Path("hdfs:///shared/mydir"));

Re: Flink 1.1.3 web UI is loading very slowly

2016-12-16 Thread Yury Ruchin
Thanks for the clue, Stephan! I will check. 2016-12-16 12:44 GMT+03:00 Stephan Ewen : > Not sure if that is the problem, but upon first access to a resource, The > web server extracts the resource from the JAR file and puts it into the > temp directpry. Maybe that is slow for

Re: Flink 1.1.3 web UI is loading very slowly

2016-12-16 Thread Stephan Ewen
Not sure if that is the problem, but upon first access to a resource, The web server extracts the resource from the JAR file and puts it into the temp directpry. Maybe that is slow for whatever reason? On Thu, Dec 15, 2016 at 8:04 PM, Yury Ruchin wrote: > Hi, > > I'm

Re: How to analyze space usage of Flink algorithms

2016-12-16 Thread otherwise777
Hey Fabian, Thanks for the quick reply, I was looking through the flink metrics [1] but i couldn't find anything in there how to analyze the environment from start to finish, only for functions that extend the richmapfunction [1]