Re: Kafka ProducerFencedException after checkpointing

2019-08-12 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 `transaction.timeout.ms` is a producer setting, thus you can increase it accordingly. Note, that brokers bound the range via `transaction.max.timeout.ms`; thus, you may need to increase this broker configs, too. - -Matthias On 8/12/19 2:43 AM, Pi

Re: Latest spark yahoo benchmark

2017-06-18 Thread Matthias J. Sax
From my understanding, the benchmark was done using Structured Streaming that is still based on micro batching. There are not throughput numbers for the new "Continuous Processing" model Spark want to introduce. Only some latency numbers. Also note, that the new "Continuous Processing" will not gi

Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Congrats! On 2/10/17 2:00 AM, Ufuk Celebi wrote: > Hey everyone, > > I'm very happy to announce that the Flink PMC has accepted Stefan > Richter to become a committer of the Apache Flink project. > > Stefan is part of the community for almost a y

Re: How to read from a Kafka topic from the beginning

2017-01-16 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 I would put this differently: "auto.offset.reset" policy is only used, if there are no valid committed offsets for a topic. See here: http://stackoverflow.com/documentation/apache-kafka/5449/consumer-groups - -and-offset-management (don't be confus

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-13 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 I think it's worth to announce this via news list. :) On 12/13/16 7:32 AM, Robert Metzger wrote: > The commun...@flink.apache.org > has been created :) > > On Tue, Dec 13, 2016 at 10:43 AM, Robert Metzger > mailt

Re: microsecond resolution

2016-12-04 Thread Matthias J. Sax
mentions milliseconds, > no? My question is whether or not I can specify microseconds where > 1000microseconds = 1millisecond. Thanks! > > On Sun, Dec 4, 2016 at 5:05 PM, Matthias J. Sax <mailto:mj...@apache.org>> wrote: > > Yes. It does. > > See: > https://ci.

Re: microsecond resolution

2016-12-04 Thread Matthias J. Sax
Yes. It does. See: https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/event_timestamps_watermarks.html#assigning-timestamps "Both timestamps and watermarks are specified as millliseconds since the Java epoch of 1970-01-01T00:00:00Z." -Matthias On 12/04/2016 10:57 AM,

Re: Unsubscribe

2016-09-29 Thread Matthias J. Sax
You need to send an email to user-unsubscr...@flink.apache.org to unsubscribe. See: https://flink.apache.org/community.html#mailing-lists -Matthias On 09/29/2016 05:44 AM, Vaidyanathan Sivasubramanian wrote: > signature.asc Description: OpenPGP digital signature

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Matthias J. Sax
I voted. It's live now. The advance of SO documentation is also, that people not familiar with Apache might do some documentation (but would never open a PR). Of course, as community, we should put the focus on web page docs. But having something additional can't hurt. From my experience, it is a

Re: Join two streams using a count-based window

2016-06-10 Thread Matthias J. Sax
I just put an answer to SO. About the other questions: Flink processes tuple-by-tuple and does some internal buffering. You might be interested in https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks -Matthias On 06/09/2016 08:13 PM, Nikos R. Katsipoulakis wrote: > Hello

Re: Flink's WordCount at scale of 1BLN of unique words

2016-05-23 Thread Matthias J. Sax
Are you talking about a streaming or a batch job? You are mentioning a "text stream" but also say you want to stream 100TB -- indicating you have a finite data set using DataSet API. -Matthias On 05/22/2016 09:50 PM, Xtra Coder wrote: > Hello, > > Question from newbie about how Flink's WordCou

Re: Barriers at work

2016-05-13 Thread Matthias J. Sax
I don't think barries can "expire" as of now. Might be a nice idea thought -- I don't know if this might be a problem in production. Furthermore, I want to point out, that an "expiring checkpoint" would not break exactly-once processing, as the latest successful checkpoint can always be used to re

Re: synchronizing two streams

2016-05-12 Thread Matthias J. Sax
cer's issues) get stuck for an > arbitrary period of time, up to several hours. Buffering the other one > during all this time would just blow the memory - streams' rates are > dozens or even hundreds of Mb/sec. > > Alex > > On Thu, May 12, 2016 at 4:00

Re: synchronizing two streams

2016-05-12 Thread Matthias J. Sax
g sleep() in either > of TwoInputStreamOperator's processWatermarkN() methods just freeze the > entire operator, stopping the consumption of both streams (as opposed to > just one)? > > Alex > > On Thu, May 12, 2016 at 2:31 PM, Matthias J. Sax <mailto:mj...@apac

Re: synchronizing two streams

2016-05-12 Thread Matthias J. Sax
I cannot follow completely. TwoInputStreamOperators defines two methods to process watermarks for each stream. So you can sync both stream within your outer join operator you plan to implement. -Matthias On 05/11/2016 05:00 PM, Alexander Gryzlov wrote: > Hello, > > We're implementing a streamin

Re: Gracefully stop long running streaming job

2016-04-18 Thread Matthias J. Sax
If all your sources implements Stoppable interface, you can STOP a job. ./bin/flink stop JobID STOP is however quite new and it is ongoing work to make available sources stoppable (some are already). Not sure what kind of sources you are using right now. -Matthias On 04/18/2016 10:50 PM, Rober

Re:

2016-04-17 Thread Matthias J. Sax
nstructions and when i try to start the web > interface i get an error can't find file specified. I tried to > change the env.java.home variable to the path of Java JDK or Java > JRE on my machine however still i get the same error. > Any idea how to solve th

Re: jar dependency in the cluster

2016-04-17 Thread Matthias J. Sax
Did you double check that your jar does contain the Kafka connector classes? I would assume that the jar is not assembled correctly. See her for some help on how to package jars correctly: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/cluster_execution.html#linking-with-modules-

Re:

2016-04-17 Thread Matthias J. Sax
You need to download Flink and install it. Follow this instructions: https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/setup_quickstart.html -Matthias On 04/16/2016 04:00 PM, Ahmed Nader wrote: > Hello, > I'm new to flink so this might seem a basic question. I added flink to

Re: CEP blog post

2016-04-06 Thread Matthias J. Sax
"Getting Started" in main page shows "Download 1.0" instead of 1.0.1 -Matthias On 04/06/2016 02:03 PM, Ufuk Celebi wrote: > The website has been updated for 1.0.1. :-) > > @Till: If you don't mention it in the post, it makes sense to have a > note at the end of the post saying that the code exam

Re: window limits ?

2016-03-29 Thread Matthias J. Sax
If you use event time, a second run will put the exact same tuples into the windows (event time implies, that the timestamp is encoded in the tuple itself, thus, it is independent of the wall-clock time). However, be aware that the order of tuples *within a window* might change! Thus, the timesta

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread Matthias J. Sax
Just out of curiosity: Why was it changes like this. Specifying "Iterable<...>" as type in AllWindowFunction seems rather unintuitive... -Matthias On 02/25/2016 01:58 PM, Aljoscha Krettek wrote: > Hi, > yes that is true. The way you would now write such a function is this: > > private static cla

Re: How to use all available task managers

2016-02-24 Thread Matthias J. Sax
Could it be, that you would need to edit client local flink-conf.yaml instead of the TaskManager config files? (In case, you do not want to specify parallelism via env.setParallelism(int);) -Matthias On 02/24/2016 04:19 PM, Saiph Kappa wrote: > Thanks! It worked now :-) > > On Wed, Feb 24, 2016

Re: Master Thesis [Apache-flink paper references]

2016-02-12 Thread Matthias J. Sax
You might want to check out the Stratosphere project web site: http://stratosphere.eu/project/publications/ -Matthias On 02/12/2016 05:52 PM, subash basnet wrote: > Hello all, > > I am currently doing master's thesis on Apache-flink. It would be really > helpful to know about the reference paper

Re: Stream conversion

2016-02-04 Thread Matthias J. Sax
Hi Sane, Currently, DataSet and DataStream API a strictly separated. Thus, this is not possible at the moment. What kind of operation do you want to perform on the data of a window? Why do you want to convert the data into a data set? -Matthias On 02/04/2016 10:11 AM, Sane Lee wrote: > Dear all

Re: Flink Stream: How to ship results through socket server

2016-01-19 Thread Matthias J. Sax
quired to have an external > process to open that socket server. > > On Tue, Jan 19, 2016 at 5:38 PM, Matthias J. Sax <mailto:mj...@apache.org>> wrote: > > Your "SocketWriter-Thread" code will run on your client. All code in > "main" runs on

Re: Flink Stream: How to ship results through socket server

2016-01-19 Thread Matthias J. Sax
Your "SocketWriter-Thread" code will run on your client. All code in "main" runs on the client. execute() itself runs on the client, too. Of course, it triggers the job submission to the cluster. In this step, the assembled job from the previous calls is translated into the JobGraph which is submi

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
ialization.SerializationSchema[T, > scala.Array[scala.Byte]]) : > org.apache.flink.streaming.api.datastream.DataStreamSink[T] = { /* compiled > code */ } > > > > On Tue, Jan 19, 2016 at 2:16 PM, Matthias J. Sax <mailto:mj...@apache.org>> wrote: > > It

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
:57 PM, Saiph Kappa wrote: > It's DataStream[String]. So it seems that SimpleStringSchema cannot be > used in writeToSocket regardless of the type of the DataStream. Right? > > On Tue, Jan 19, 2016 at 1:32 PM, Matthias J. Sax <mailto:mj...@apache.org>> wrote: > > Wh

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
ew SimpleStringSchema()) > > > Thanks. > > On Tue, Jan 19, 2016 at 10:34 AM, Matthias J. Sax <mailto:mj...@apache.org>> wrote: > > There is SimpleStringSchema. > > -Matthias > > On 01/18/2016 11:21 PM, Saiph Kappa wrote: > &

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Matthias J. Sax
server receives strings. I have > something like this in scala: > > |val server =newServerSocket()while(true){val s =server.accept()val > in=newBufferedSource(s.getInputStream()).getLines()println(in.next())s.close()} > > | > > Thanks| > | > > > > On Mon

Re: Cancel Job

2016-01-18 Thread Matthias J. Sax
Hi, currently, messaged in flight will be dropped if a streaming job gets canceled. There is already WIP to add a STOP signal which allows for a clean shutdown of a streaming job. This should get merged soon and will be available in Flink 1.0. You can follow the JIRA an PR here: https://issues.a

Re: Flink Stream: collect in an array all records within a window

2016-01-18 Thread Matthias J. Sax
Hi Saiph, you can use AllWindowFunction via .apply(...) to get an .collect method: From: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html > // applying an AllWindowFunction on non-keyed window stream > allWindowedStream.apply (new AllWindowFunction, > Integ

Re: Working with storm compatibility layer

2016-01-14 Thread Matthias J. Sax
;> at >> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:686) >> at >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982) >> at >> org.apache.flink.runti

Re: Working with storm compatibility layer

2016-01-14 Thread Matthias J. Sax
ompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java

Re: Working with storm compatibility layer

2016-01-14 Thread Matthias J. Sax
Hi, I can submit the topology without any problems. Your code is fine. If your program "exits silently" I would actually assume, that you submitted the topology successfully. Can you see the topology in JobManager WebFrontend? If not, do you see any errors in the log files? -Matthias On 01/14/2

Re: DataStream jdbc sink

2016-01-13 Thread Matthias J. Sax
Hi, use JDBCOutputFormatBuilder to set all required parameters: > JDBCOutputFormatBuilder builder = JDBCOutputFormat.buildJDBCOutputFormat(); > builder.setDBUrl(...) > // and more > > var.write(builder.finish, OL); -Matthias On 01/13/2016 06:21 PM, Traku traku wrote: > Hi everyone. > > I'm t

Re: Working with storm compatibility layer

2016-01-09 Thread Matthias J. Sax
Hi, I just double checked the Flink code and during translation from Storm to Flink declareOuputFields() is called twice. You are right that is does the same job twice, but that is actually not a problem. The Flink code is cleaner this way to I guess we will not change it. About lifecyle: If you

Re: Working with storm compatibility layer

2016-01-09 Thread Matthias J. Sax
Hello Shinhyung, that sounds weird and should not happen -- Spout.open() should get called exactly once. I am not sure about multiple calls to declareOuputFields though -- if might be called multiple times -- would need to double check the code. However, the call to declareOuputFields should be i

Re: [ANNOUNCE] Introducing Apache Flink Taiwan User Group - Flink.tw

2016-01-02 Thread Matthias J. Sax
Pretty cool! On 01/02/2016 10:30 AM, Fabian Hueske wrote: > Hi Gordon, > > this is great news! I am very happy to hear that there is a new local > Flink community in Taiwan! > Translating blog posts, slide sets, and other documentation is very > valuable and makes the project known to a broader a

Re: Problem with passing arguments to Flink Web Submission Client

2015-12-21 Thread Matthias J. Sax
arFile.jar -f flink -i -m 1 > and it is working perfectly fine. Is there a difference between this two > ways of submitting a job ("bin/flink MyJar.jar" and "bin/flink run > MyJar.jar")? > > I will open a Jira. > > Best Regards, > Filip Łęczycki > >

Re: Problem with passing arguments to Flink Web Submission Client

2015-12-20 Thread Matthias J. Sax
The bug is actually in the CLI (it's not a WebClient related issue) if you run > bin/flink myJarFile.jar -f flink -i -m 1 it also returns > Unrecognized option: -f -Matthias On 12/20/2015 09:37 PM, Matthias J. Sax wrote: > That is a bug. Can you open a JIRA for it? > >

Re: Problem with passing arguments to Flink Web Submission Client

2015-12-20 Thread Matthias J. Sax
That is a bug. Can you open a JIRA for it? You can work around by not prefixing your flag with "-" -Matthias On 12/20/2015 12:59 PM, Filip Łęczycki wrote: > Hi all, > > I would like get the pretty printed execution plan of my job, in order > to achieve that I uploaded my jar to Flink Web Submis

Re: Apache Flink Web Dashboard - Completed Job history

2015-12-16 Thread Matthias J. Sax
I guess it should be possible to manually save this information from the corresponding directory and copy it back after restart? But I am not sure? Please correct me if I am wrong. -Matthias On 12/16/2015 03:16 PM, Ufuk Celebi wrote: > >> On 16 Dec 2015, at 15:00, Ovidiu-Cristian MARCU >> wro

Re: flink streaming documentation

2015-12-15 Thread Matthias J. Sax
Thanks for reporting! Would you like to fix this and open a PR? -Matthias On 12/15/2015 04:43 AM, Radu Tudoran wrote: > Hi, > > > > I believe i found 2 small inconsistencies in the documentation for the > description of Window Apply > > https://ci.apache.org/projects/flink/flink-docs-relea

Re: Flink Storm

2015-12-09 Thread Matthias J. Sax
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java: > 1339) > at > > scala.concurrent.forkjoin.ForkJoinPool.

Re: Question about DataStream serialization

2015-12-08 Thread Matthias J. Sax
in any way (including, but > not limited to, total or partial disclosure, reproduction, or dissemination) > by persons other than the intended recipient(s) is prohibited. If you receive > this e-mail in error, please notify the sender by phone or email immediately > and delet

Re: Question about DataStream serialization

2015-12-07 Thread Matthias J. Sax
Hi Radu, you are right. The open() method is called for each parallel instance of a rich function. Thus, if all instanced use the same code, you might read the same data multiple times. The easiest was to distinguish different instanced within open() is to user the RuntimeContext. If offers two m

Re: Any role for volunteering

2015-12-05 Thread Matthias J. Sax
Hi Deepak, the Flink community is always open to new people who want to contribute to the project. Please subscribe to the user- and dev-mailing list as a starting point: https://flink.apache.org/community.html#mailing-lists Furthermore, please read the following docs: https://flink.apache.org/ho

Re: Flink Storm

2015-12-05 Thread Matthias J. Sax
Hi Naveen, in you previous mail you mention that > Yeah, I did route the ³count² bolt output to a file and I see the output. > I can see the Storm and Flink output matching. How did you do this? Modifying the "count bolt" code? Or did you use some other bolt that consumes the "count bolt" output

Re: flink connectors

2015-11-27 Thread Matthias J. Sax
If I understand the question right, you just want to download the jar manually? Just go to the maven repository website and download the jar from there. -Matthias On 11/27/2015 02:49 PM, Robert Metzger wrote: > Maybe there is a maven mirror you can access from your network? > > This site conta

Re: Doubt about window and count trigger

2015-11-26 Thread Matthias J. Sax
Hi, a Trigger is an *additional* condition for intermediate (early) evaluation of the window. Thus, it is not "or-ed" to the basic window definition. If you want to have an or-ed window condition, you can customize it by specifying your own window definition. > dataStream.window(new MyOwnWindow(

Re: ReduceByKeyAndWindow in Flink

2015-11-23 Thread Matthias J. Sax
Hi, Can't you use a second keyed window (with the same size) and apply .max(...)? -Matthias On 11/23/2015 11:00 AM, Konstantin Knauf wrote: > Hi Fabian, > > thanks for your answer. Yes, that's what I want. > > The solution you suggest is what I am doing right now (see last of the > bullet poin

Re: Destroy StreamExecutionEnv

2015-10-05 Thread Matthias J. Sax
Hi, you just need to terminate your source (ie, return from run() method if you implement your own source function). This will finish the complete program. For already available sources, just make sure you read finite input. Hope this helps. -Matthias On 10/05/2015 12:15 AM, jay vyas wrote: > H

Re: what different between join and coGroup in flink

2015-09-06 Thread Matthias J. Sax
Yes. On 09/06/2015 10:55 PM, hagersaleh wrote: > Join can be executed more efficiently than CoGroup > this means Join faster than COGroup in executed > > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/what-different-between-join

Re: How to create a stream of data batches

2015-09-04 Thread Matthias J. Sax
Hi Andres, you could do this by using your own data type, for example > public class MyBatch { > private ArrayList data = new ArrayList > } In the DataSource, you need to specify your own InputFormat that reads multiple tuples into a batch and emits the whole batch at once. However, be aware,

Re: Usage of Hadoop 2.2.0

2015-09-04 Thread Matthias J. Sax
+1 for dropping On 09/04/2015 11:04 AM, Maximilian Michels wrote: > +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The > release is hardly used and complicates the important high-availability > changes in Flink. > > On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen wrote: >> I am good

Re: How to force the parallelism on small streams?

2015-09-03 Thread Matthias J. Sax
only > 14 elements then only the first 14 mappers will ever receive data. The > round-robin distribution is not global, since the sources operate > independently from each other. > > Cheers, > Aljoscha > > On Wed, 2 Sep 2015 at 20:00 Matthias J. Sax <mailto:mj...@apac

Re: question on flink-storm-examples

2015-09-03 Thread Matthias J. Sax
One more remark that just came to my mind. There is a storm-hdfs module available: https://github.com/apache/storm/tree/master/external/storm-hdfs Maybe you can use it. It would be great if you could give feedback if this works for you. -Matthias On 09/02/2015 10:52 AM, Matthias J. Sax wrote

Re: How to force the parallelism on small streams?

2015-09-02 Thread Matthias J. Sax
oblem by replacing rebalance() with shuffle(). > > But I found a workaround: setting parallelism to 1 for the source (I don't > need a 100 directory scanners anyway), it forces the rebalancing evenly > between the mappers. > > Greetings, > Arnaud > > > -Mes

Re: How to force the parallelism on small streams?

2015-09-02 Thread Matthias J. Sax
Hi, If I understand you correctly, you want to have 100 mappers. Thus you need to apply the .setParallelism() after .map() > addSource(myFileSource).rebalance().map(myFileMapper).setParallelism(100) The order of commands you used, set the dop for the source to 100 (which might be ignored, if the

Re: question on flink-storm-examples

2015-09-02 Thread Matthias J. Sax
1 14:41 > /home/jerrypeng/hadoop/hadoop_dir/dir1 > > drwxr-xr-x - jerrypeng supergroup 0 2015-08-24 15:59 > /home/jerrypeng/hadoop/hadoop_dir/test > > -rw-r--r-- 3 jerrypeng supergroup 32 2015-08-24 15:59 > /home/jerrypeng/hadoop/hadoop_di

Re: question on flink-storm-examples

2015-09-01 Thread Matthias J. Sax
call to killTopology(). -Matthias On 09/01/2015 11:16 PM, Matthias J. Sax wrote: > Oh yes. I forgot about this. I have already a fix for it in a pending > pull request... I hope that this PR is merged soon... > > If you want to observe the progress, look here: > https://issues.apache.org/j

Re: question on flink-storm-examples

2015-09-01 Thread Matthias J. Sax
er(ForkJoinPool.java:1979) > > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > It seems to have something to do with canceling of the topology after > the sleep. Any ideas? > > > Best, > > > Jerry > &g

Re: question on flink-storm-examples

2015-09-01 Thread Matthias J. Sax
ke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:483) > > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) > >

Re: question on flink-storm-examples

2015-09-01 Thread Matthias J. Sax
Hi Jerry, WordCount-StormTopology uses a hard coded dop of 4. If you start up Flink in local mode (bin/start-local-streaming.sh), you need to increase the number of task slots to at least 4 in conf/flink-conf.yaml before starting Flink -> taskmanager.numberOfTaskSlots You should actually see the

Re: Problem with Windowing

2015-08-31 Thread Matthias J. Sax
stuff. Then the de duplication code as written in my first mail us assigned > to a new variable called output. Then output.addSink(.) is called. > > >> Am 31.08.2015 um 17:45 schrieb Matthias J. Sax >> : >> >> Can you post your whole program (both versions if p

Re: Problem with Windowing

2015-08-31 Thread Matthias J. Sax
Can you post your whole program (both versions if possible)? Otherwise I have only a wild guess: A common mistake is not to assign the stream variable properly: DataStream ds = ... ds = ds.APPLY_FUNCTIONS ds.APPLY_MORE_FUNCTIONS In your code example, the assignment is missing -- but maybe it j

Re: Event time in Flink streaming

2015-08-28 Thread Matthias J. Sax
Hi Martin, you need to implement you own policy. However, this should be be complicated. Have a look at "TimeTriggerPolicy". You just need to provide a "Timestamp" implementation that extracts you ts-attribute from the tuples. -Matthias On 08/28/2015 03:58 PM, Martin Neumann wrote: > Hej, > > I

Re: New contributor tasks

2015-08-27 Thread Matthias J. Sax
One more thing. Not every open issue is documented in JIRA (even if you try to do this). You can also have a look into the wiki: https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home So if you are interested to work on a specific component you might try to talk to the main contribute

Re: Source & job parallelism

2015-08-25 Thread Matthias J. Sax
Hi Arnaud, did you try: > Env.setSource(mySource).setParrellelism(1).map(mymapper).setParallelism(10) If this does not work, it might be that Flink chains the mapper to the source which implies to use the same parallelism (and the producer dictates this dop value). Using a rebalance() in betwee

Re: java.lang.ClassNotFoundException when deploying streaming jar locally

2015-08-06 Thread Matthias J. Sax
Never mind. Just saw that this in not the problem... Sounds weird to me. Maybe you can try to name the class. Anonymous classes should not be a problem, but it should be worth a try. -Matthias On 08/06/2015 01:51 PM, Matthias J. Sax wrote: > If I see it correctly your jar contains >

Re: java.lang.ClassNotFoundException when deploying streaming jar locally

2015-08-06 Thread Matthias J. Sax
If I see it correctly your jar contains > com/davengo/rfidcloud/flink/DaoJoin$1.class But your error message says > ClassNotFoundException: com.otter.ist.flink.DaoJoin$1 Both are different packages. Your jar seems not be correctly packaged. -Matthias On 08/06/2015 12:46 PM, Michael Huelfenha

Re: Streaming window : count with timeout ?

2015-07-17 Thread Matthias J. Sax
You can implement an custom window policy (I guess this should be flexible enough for your case). See documentation: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#policy-based-windowing If you have further question after reading, just go ahead :) -Matthias On

Re: Cluster execution -jar files-

2015-07-16 Thread Matthias J. Sax
As the JavaDoc explains: >* @param jarFiles The JAR files with code that needs to be shipped to > the cluster. If the program uses >* user-defined functions, user-defined input formats, > or any libraries, those must be >* provided in the J

Re: Order groups by their keys

2015-07-15 Thread Matthias J. Sax
Hi Robert, global sorting of the final output is currently no supported by Flink out-of-the-box. The reason is, that a global sort requires all data to be processed by a single node (what contradicts data parallelism). For small output, you could use a final "reduce" with no key (ie, all data go

Re: why when use orders.aggregate(Aggregations.MAX, 2) not return one value but return more value

2015-07-08 Thread Matthias J. Sax
This is your code (it applied the "print" before the aggregation is done) > ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); > DataSet orders=(DataSet) > env.readCsvFile("/home/hadoop/Desktop/Dataset/orders.csv") > .fieldDelimiter('|') > .includeFields(ma

Re: cogroup

2015-06-29 Thread Matthias J. Sax
Why do you not use a join? CoGroup seems not to be the right operator. -Matthias On 06/29/2015 05:40 PM, Michele Bertoni wrote: > Hi I have a question on cogroup > > when I cogroup two dataset is there a way to compare each element on the left > with each element on the right (inside a group) w

Re: Pergem exception from web-client

2015-06-24 Thread Matthias J. Sax
Hi, you need to increase JVM parameter "-XX:MaxPermSize=" The default value should be something like "64m" Just add the flag to variable JVM_ARGS in "bin/webclient.sh" (line 33). -> Compare "bin/jobmanager.sh" (line 35) -Matthias On 06/24/2015 06:38 PM, Flavio Pompermaier wrote: > Hi to all, >

Re: Job Statistics

2015-06-18 Thread Matthias J. Sax
Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: > Hello, > > Is it possible to view job statistics after it finished to execute > directly in the comm

Re: Flink Streaming State Management

2015-06-17 Thread Matthias J. Sax
Hi Hilmi, currently, this is not supported. However, state management is already work in progress and should be available soon. See https://github.com/apache/flink/pull/747 -Matthias On 06/17/2015 09:36 AM, Hilmi Yildirim wrote: > Hi, > does Flink Streaming support state management? For example,

Re: Memory in local setting

2015-06-17 Thread Matthias J. Sax
Hi, look at slide 35 for more details about memory configuration: http://www.slideshare.net/robertmetzger1/apache-flink-hands-on -Matthias On 06/17/2015 09:29 AM, Chiwan Park wrote: > Hi. > > You can increase the memory given to Flink by increasing JVM Heap memory in > local. > If you are usi

Re: Random Shuffling

2015-06-15 Thread Matthias J. Sax
Hi, using partitionCustom, the data distribution depends only on your probability distribution. If it is uniform, you should be fine (ie, choosing the channel like > private final Random random = new Random(System.currentTimeMillis()); > int partition(K key, int numPartitions) { > return random

Re: Random Shuffling

2015-06-15 Thread Matthias J. Sax
I think, you need to implement an own Partitioner.java and hand it via DataSet.partitionCustom(partitioner, field) (Just specify any field you like; as you don't want to group by key, it doesn't matter.) When implementing the partitionier, you can ignore the key parameter and compute the output c

Re: I want run flink program in ubuntu x64 Mult Node Cluster what is configuration?

2015-06-08 Thread Matthias J. Sax
Are you sure, that the TaskManager registered to the JobManager correctly? You can check on the master machine in your browser: > localhost:8081 Additionally, you might need to increase the number of slots for the TaskManager. Increase 'taskmanager.numberOfTaskSlots' within conf/flink-conf.yaml.

Re: I want run flink program in ubuntu x64 Mult Node Cluster what is configuration?

2015-06-08 Thread Matthias J. Sax
Can you please share the whole console output. It is unclear what the problem might be from this short message. On 06/08/2015 10:24 AM, hagersaleh wrote: > when run progam > > display > error:null > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336

Re: I want run flink program in ubuntu x64 Mult Node Cluster what is configuration?

2015-06-08 Thread Matthias J. Sax
Hi, in order to use bin/start-cluster.sh you need to configure conf/slaves on your master machine. (conf/slave is ignored on slave machines). Hope this helps. If not, please post the error message you get. It's hard to figure out what wrong without the stacktrace. -Matthias On 06/08/2015 09:43

Re: Package multiple jobs in a single jar

2015-05-17 Thread Matthias J. Sax
Hi, I like the idea that Flink's WebClient can show different plans for different jobs within a single jar file. I prepared a prototype for this feature. You can find it here: https://github.com/mjsax/flink/tree/multipleJobsWebUI To test the feature, you need to prepare a jar file, that contains

Re: Best way to join with inequalities (historical data)

2015-05-04 Thread Matthias J. Sax
Hi, there is no other system support to express this join. However, you could perform some "hand wired" optimization by partitioning your input data into distinct intervals. It might be tricky though. Especially, if the time-ranges in your "range-key" dataset are overlapping everywhere (-> data r