Re: Logical plan optimization with Calcite

2016-07-21 Thread gallenvara
Thanks Max and Timo for the explanation. :) -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Logical-plan-optimization-with-Calcite-tp8037p8106.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: flink1.0 DataStream groupby

2016-07-21 Thread Suneel Marthi
It should be keyBy(0) for DataStream API (since Flink 0.9) Its groupBy() in DataSet API. On Fri, Jul 22, 2016 at 1:27 AM, wrote: > Hi, > today,I use flink to rewrite my spark project,in spark ,data is > rdd,and it have much transformations and actions,but in flink,the > DataStream does not

flink1.0 DataStream groupby

2016-07-21 Thread rimin515
Hi,today,I use flink to rewrite my spark project,in spark ,data is rdd,and it have much transformations and actions,but in flink,the DataStream does not have groupby and foreach, for example,val env=StreamExecutionEnvironment.createLocalEnvironment() val data=List(("1"->"a

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-21 Thread Clifford Resnick
I took another look at this and it occurred to me that the S3a directory issue is actually localized to Cloudera's hadoop-aws version, which is stuck at 2.6.0. Apparently the zeroed out directory timestamps are in the Flink recommended version. So, Flink/Yarn/S3a will work, just not with CDH5. I

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-21 Thread Clifford Resnick
I have a fix and test for a recursive HDFSCopyToLocal. I also added similar code to Yarn application staging. However, even though all files and resources now copy correctly, S3A still fails on Flink session creation. The failure stems from the lib folder being registered as an application resou

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-07-21 Thread 김동일
I’saw the source code of flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java Flink ships the FLINK_LIB_DIR and add to classpath only jar files. I want to know how to add resource file to classpath. Best Regards, Dong-iL, Kim. > On Jul 22, 2016, at 4:28 AM, Dong iL,

Re: counting words (not frequency)

2016-07-21 Thread hrajaram
Can't you use a KeyedStream, I mean keyBy with the sameKey? something like this, source.flatMap(new Tokenizer()).keyBy(0).sum(2).project(2).print(); Assuming tokenizer is giving Tuple3 1-> is always the same key, say "test" 2->the actual word 3-> 1 There might be some other good choices bu

counting words (not frequency)

2016-07-21 Thread Roshan Naik
Was trying to write a simple streaming Flink program that counts the total words(not the frequency) in a fie. I was thinking on the lines of : counts = text.flatMap(new Tokenizer()) .count(); // count() isnt part of streamin APIs (but supported for batching) Any suggestions on how to do this

Re: Processing windows in event time order

2016-07-21 Thread Sameer W
Alijoscha - Thanks it works exactly as you said. I found out why my windows were firing twice. I was making the error of adding the AutoWatermarkInterval to the existing watermark each time the watermark was sampled from the source just to fire a window if one of the sources was delayed substantial

add FLINK_LIB_DIR to classpath on yarn

2016-07-21 Thread Dong iL, Kim
Hello. I have a flink cluster on yarn. I wanna add FLINK_LIB_DIR to classpath. because hibernate.cfg.xml need to be on the classpath. when i'm using stand alone cluster, just add FLINK_LIB_DIR to FLINK_CLASSPATH. but on yarn, Fixing config.sh, yarn-session.sh and flink-daemon.sh is not working. Be

Re: Processing windows in event time order

2016-07-21 Thread David Desberg
Aljoscha, Awesome. Exactly the behavior I was hoping would be exhibited. Thank you for the quick answer :) Thanks, David On Thu, Jul 21, 2016 at 2:17 AM, Aljoscha Krettek wrote: > Hi David, > windows are being processed in order of their end timestamp. So if you > specify an allowed lateness o

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread Suneel Marthi
I meant to respond to this thread yesterday, but got busy with work and slipped me. This is possible doable using Flink Streaming, others can correct me here. *Assumption:* Both the Batch and Streaming processes are reading from a single Kafka topic and by "Batched data", I am assuming its the sa

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread milind parikh
At this point in time, imo, batch processing is not why you should be considering Flink. That said, I predict that the stream processing (and event processing) will become the dominant methodology; as we begin to gravitate towards "I can't wait; I want it now" phenomenon. In that methodology, I

Re: taskmanager memory leak

2016-07-21 Thread 김동일
I think so. I’ll test it on EMR and then reply. I am truly grateful for your support. > On Jul 21, 2016, at 8:49 PM, Stephan Ewen wrote: > > I don't know that answer, sorry. Maybe one of the others can chime in here. > > Did you deactivate checkpointing (then it should not write to S3) and di

Re: Processing windows in event time order

2016-07-21 Thread Sameer W
Stream2 does send watermarks only after it sees elements C,D. It send the watermark (5) 20 seconds after Stream 1 sends it. >From what I understand Flink merges watermarks from both streams on the Reduce side. But does it wait a certain pre-configured amount of time (for watermarks from both strea

Re: Processing windows in event time order

2016-07-21 Thread Aljoscha Krettek
Yes, that is to be expected. Stream 2 should only send the watermark once the elements with a timestamp lower than the watermark have been sent as well. On Thu, 21 Jul 2016 at 13:10 Sameer W wrote: > Thanks, Aljoscha, > > This what I am seeing when I use Ascending timestamps as watermarks- > > C

Re: taskmanager memory leak

2016-07-21 Thread Stephan Ewen
I don't know that answer, sorry. Maybe one of the others can chime in here. Did you deactivate checkpointing (then it should not write to S3) and did that resolve the leak? Best, Stephan On Thu, Jul 21, 2016 at 12:52 PM, 김동일 wrote: > Dear Stephan. > > I also suspect the s3. > I’ve tried s3n,

Re: Processing windows in event time order

2016-07-21 Thread Sameer W
Thanks, Aljoscha, This what I am seeing when I use Ascending timestamps as watermarks- Consider a window if 1-5 seconds Stream 1- Sends Elements A,B Stream 2 (20 seconds later) - Sends Elements C,D I see Window (1-5) fires first with just A,B. After 20 seconds Window (1-5) fires again but this

Re: taskmanager memory leak

2016-07-21 Thread 김동일
Dear Stephan. I also suspect the s3. I’ve tried s3n, s3a. what is suitable library? I’m using aws-java-sdk-1.7.4 and hadoop-aws-2.7.2. Best regards. > On Jul 21, 2016, at 5:54 PM, Stephan Ewen wrote: > > Hi! > > There is a memory debugging logger, you can activate it like that: > https://ci.

DataStreamUtils conversion problem, showing varied results for same code

2016-07-21 Thread subash basnet
Hello all, My task to cluster the stream of points around the centroids, I am using DataStreamUtils to collect the stream and pass it on to the map function to perform the necessary action. Below is the code: DataStream points = newDataStream.map(new getPoints()); DataStream centroids = newCentro

Re: Processing windows in event time order

2016-07-21 Thread Aljoscha Krettek
Hi David, windows are being processed in order of their end timestamp. So if you specify an allowed lateness of zero (which will only be possible on Flink 1.1 or by using a custom trigger) you should be able to sort the elements. The ordering is only valid within one key, though, since windows for

Re: Guava immutable collection kryo serialization

2016-07-21 Thread Stephan Ewen
Hi! Custom Kryo Serializers can be shipped either as objects (must be serializable) or as classes (can be non serializable, must have a default constructor). For non-serializable serializers, try to use: ExecutionConfig. registerTypeWithKryoSerializer(Class type, Class> serializerClass) Stephan

Re: taskmanager memory leak

2016-07-21 Thread Stephan Ewen
Hi! There is a memory debugging logger, you can activate it like that: https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#memory-and-performance-debugging It will print which parts of the memory are growing. What you can also try is to deactivate checkpointing for one run a

Re: Guava immutable collection kryo serialization

2016-07-21 Thread Stefan Richter
Hi, to answer this question, it would be helpful if you could provide the stacktrace of your exception and the code you use to register the serializer. Best, Stefan > Am 21.07.2016 um 05:28 schrieb Shaosu Liu : > > > Hi, > > How do I do Guava Immutable collections serialization in Flink? >

Re: Running multiple Flink Streaming Jobs, one by one

2016-07-21 Thread Stefan Richter
Hi, the answer to this question depends on how you are starting the jobs. Do you have Java program that submits jobs in a loop that repeatedly calls StreamExecutionEnvironment.execute() or a shell script that submits jobs through the CLI? In both cases, the process should block (either on Stre