Return of Flink shading problems in 1.2.0

2017-03-16 Thread Foster, Craig
Hi: A few months ago, I was building Flink and ran into shading issues for flink-dist as described in your docs. We resolved this in BigTop by adding the correct way to build flink-dist in the do-component-build script and everything was fine after that. Now, I’m running into issues doing the

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tzu-Li (Gordon) Tai
Hi Tarandeep, Thanks for clarifying. For the next step, I would recommend taking a look at  https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/debugging_event_time.html  and try to find out what exactly is wrong with the watermark progression. Flink 1.2 exposes watermarks as

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tarandeep Singh
Anyone? Any suggestions what could be going wrong or what I am doing wrong? Thanks, Tarandeep On Thu, Mar 16, 2017 at 7:34 AM, Tarandeep Singh wrote: > Data is read from Kafka and yes I use different group id every time I run > the code. I have put break points and print

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-16 Thread Yassine MARZOUGUI
Hi Xiaogang, Indeed, the MapState is what I was looking for in order to have efficient sorted state, as it would faciliate many use cases like this one, or joining streams, etc. I searched a bit and found your contribution of MapState for the next 1.3

Re: Batch stream Sink delay ?

2017-03-16 Thread Paul Smith
Due to the slight out of sequence of the log timestamps, I tried switching to a “BoundedOutOfOrdernessTimestampExtractor” and used a minute as the threshold, but I still couldn’t get the watermarks to fire. Setting break points and trying to follow the code, I Can’t see hwere the

Re: Batch stream Sink delay ?

2017-03-16 Thread Fabian Hueske
Actually, I think the program you shared in the first mail of this thread looks fine for the purpose you describe. Timestamp and watermark assignment works as follows: - For each records, a long timestamp is extracted (UNIX/epoch timestamp). - A watermark is a timestamp which says that no more

Re: Batch stream Sink delay ?

2017-03-16 Thread Paul Smith
I have managed to discover that my presumption of log4j log file being a _guaranteed_ sequential order of time is incorrect (race conditions). So some logs are out of sequence, and I was getting _some_ Monotonic Timestamp violations. I did not discover this because somehow my local flink was

Re: Flink 1.2 and Cassandra Connector

2017-03-16 Thread Robert Metzger
I've created a pull request for the fix: https://github.com/apache/flink/pull/3556 It would be nice if one of the issue reporters could validate that the cassandra connector works after the fix. If it is a valid fix, I would like to include it into the upcoming 1.2.1 release. On Thu, Mar 16,

Re: Flink 1.2 and Cassandra Connector

2017-03-16 Thread Robert Metzger
Yep, this is definitively a bug / misconfiguration in the build system. The cassandra client defines metrics-core as a dependency, but the shading is dropping the dependency when building the dependency reduced pom. To resolve the issue, we need to add the following line into the shading config

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tarandeep Singh
Data is read from Kafka and yes I use different group id every time I run the code. I have put break points and print statements to verify that. Also, if I don't connect with control stream the window function works. - Tarandeep > On Mar 16, 2017, at 1:12 AM, Tzu-Li (Gordon) Tai

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread vinay patil
@ Stephan, I am not using explicit Evictor in my code. I will try using the Fold function if it does not break my existing functionality :) @Robert : Thank you for your answer, yes I have already tried to set G1GC this morning using env.java.opts, it works. Which is the recommended GC for

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread Robert Metzger
Yes, you can change the GC using the env.java.opts parameter. We are not setting any GC on YARN. On Thu, Mar 16, 2017 at 1:50 PM, Stephan Ewen wrote: > The only immediate workaround is to use windows with "reduce" or "fold" or > "aggregate" and not "apply". And to not use an

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread Stephan Ewen
The only immediate workaround is to use windows with "reduce" or "fold" or "aggregate" and not "apply". And to not use an evictor. The good news is that I think we have a good way of fixing this soon, making an adjustment in RocksDB. For the Yarn / g1gc question: Not 100% sure about that - you

Re: Batch stream Sink delay ?

2017-03-16 Thread Fabian Hueske
What kind of timestamp and watermark extractor are you using? Can you share your implementation? You can have a look at the example programs (for example [1]). These can be started and debugged inside an IDE by executing the main method. If you run it in an external process, you should be able to

Re: Batch stream Sink delay ?

2017-03-16 Thread Paul Smith
Thanks again for your reply. I've tried with both Parallel=1 through to 3. Same behavior. The log file is monotonically increasing time stamps generated through an application using log4j. Each log line is distinctly incrementing time stamps it is an 8GB file I'm using as a test case and has

Re: SQL + flatten (or .*) quality docs location?

2017-03-16 Thread Fabian Hueske
Hi Stu, there is only one page of documentation for the Table API and SQL [1]. I agree the structure could be improved and split into multiple pages. Regarding the flatting of a Pojo have a look at the "Built-In Functions" section [2]. If you select "SQL" and head to the "Value access

Re: Questions regarding queryable state

2017-03-16 Thread Ufuk Celebi
On Thu, Mar 16, 2017 at 10:00 AM, Kathleen Sharp wrote: > Hi, > > I have some questions regarding the Queryable State feature: > > Is it possible to use the QueryClient to get a list of keys for a given State? No, this is not possible at the moment. You would have to

Re: Batch stream Sink delay ?

2017-03-16 Thread Fabian Hueske
Hi Paul, since each operator uses the minimum watermark of all its inputs, you must ensure that each parallel task is producing data. If a source does not produce data, it will not increase the timestamps of its watermarks. Another challenge, that you might run into is that you need to make sure

Questions regarding queryable state

2017-03-16 Thread Kathleen Sharp
Hi, I have some questions regarding the Queryable State feature: Is it possible to use the QueryClient to get a list of keys for a given State? At the moment it is not possible to use ListState - will this ever be introduced? My first impression is that I would need one of these 2 to be able to

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tzu-Li (Gordon) Tai
Hi Tarandeep, I haven’t looked at the rest of the code yet, but my first guess is that you might not be reading any data from Kafka at all: private static DataStream readKafkaStream(String topic, StreamExecutionEnvironment env) throws IOException { Properties properties = new

Re: Checkpointing with RocksDB as statebackend

2017-03-16 Thread vinay patil
Hi Stephan, What can be the workaround for this ? Also need one confirmation : Is G1 GC used by default when running the pipeline on YARN. (I see a thread of 2015 where G1 is used by default for JAVA8) Regards, Vinay Patil On Wed, Mar 15, 2017 at 10:32 PM, Stephan Ewen [via Apache Flink User

Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-16 Thread Bowen Li
There's always a tradeoff we need to make. I'm in favor of upgrading to Java 8 to bring in all new Java features. The common way I've seen (and I agree) other software upgrading major things like this is 1) upgrade for next big release without backward compatibility and notify everyone 2)

Data+control stream from kafka + window function - not working

2017-03-16 Thread Tarandeep Singh
Hi, I am using flink-1.2 and reading data stream from Kafka (using FlinkKafkaConsumer08). I want to connect this data stream with another stream (read control stream) so as to do some filtering on the fly. After filtering, I am applying window function (tumbling/sliding event window) along with