Create window before the first event

2016-07-11 Thread Xiang Zhang
Hi, I am trying to have a trigger fires every 5 mins, even when sometimes no event comes (just output default for empty window). The closest solution I got to work is this: datastream.windowAll(GlobalWindows.create())

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
Yes, I've password-less SSH to the job manager node. On Mon, Jul 11, 2016 at 4:53 PM, Greg Hogan wrote: > pdsh is only used for starting taskmanagers. How did you work around this? > You are able to passwordless-ssh to the jobmanager? > > The error looks to be from

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Greg Hogan
pdsh is only used for starting taskmanagers. How did you work around this? You are able to passwordless-ssh to the jobmanager? The error looks to be from config.sh:318 in rotateLogFile. The way we generate the taskmanager index assumes that taskmanagers are started sequentially

Re: ContinuousProcessingTimeTrigger on empty

2016-07-11 Thread Xiang Zhang
Hi Kostas, Yes, so I tried GlobalWindows. Is it possible to trigger every 5 mins on GlobalWindows? From the comments in the source for ContinuousProcessingTimeTrigger, it says: * A {@link Trigger} that continuously fires based on a given time interval as measured by * the clock of the machine

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
Looking at what happens with pdsh, there are two things that go wrong. 1. pdsh is installed in a node other than where the job manager would run, so invoking *start-cluster *from there does not spawn a job manager. Only if I do start-cluster from the node I specify as the job manager's node that

ContinuousProcessingTimeTrigger on empty

2016-07-11 Thread Xiang Zhang
Hi, I want to have a trigger fires every 5 seconds in processing time even when no event comes. I tried datastream.windowAll(GlobalWindows.create()) .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(5))) .apply { MY_APPLY_FUNCTION} However,

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
I meant, I'll check when current jobs are done and will let you know. On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake wrote: > I am running some jobs now. I'll stop and restart using pdsh to see what > was the issue again > > On Mon, Jul 11, 2016 at 12:15 PM, Greg Hogan

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
I am running some jobs now. I'll stop and restart using pdsh to see what was the issue again On Mon, Jul 11, 2016 at 12:15 PM, Greg Hogan wrote: > I'd definitely be interested to hear any insight into what failed when > starting the taskmanagers with pdsh. Did the command

Re: Parameters to Control Intra-node Parallelism

2016-07-11 Thread Saliya Ekanayake
Thank you Greg, I'll check if this was the cause for my TMs to disappear. On Mon, Jul 11, 2016 at 11:34 AM, Greg Hogan wrote: > The OOM killer doesn't give warning so you'll need to call dmesg or look > in /var/log/messages or similar. The following reports that Debian

Re: Parameters to Control Intra-node Parallelism

2016-07-11 Thread Greg Hogan
The OOM killer doesn't give warning so you'll need to call dmesg or look in /var/log/messages or similar. The following reports that Debian flavors may use /var/log/syslog. http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer On Sun, Jul 10, 2016 at

Re: Dynamic partitioning for stream output

2016-07-11 Thread Josh
Hi guys, I've been working on this feature as I needed something similar. Have a look at my issue here https://issues.apache.org/jira/browse/FLINK-4190 and changes here https://github.com/joshfg/flink/tree/flink-4190 The changes follow Kostas's suggestion in this thread. Thanks, Josh On Thu,

Re: Flink Completed Jobs only has 5 entries

2016-07-11 Thread Saliya Ekanayake
Thank you, Ufuk! On Mon, Jul 11, 2016 at 10:46 AM, Ufuk Celebi wrote: > Yes, via jobmanager.web.history > ( > https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#jobmanager-web-frontend > ) > > On Mon, Jul 11, 2016 at 4:45 PM, Saliya Ekanayake

Re: Flink Completed Jobs only has 5 entries

2016-07-11 Thread Ufuk Celebi
Yes, via jobmanager.web.history (https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#jobmanager-web-frontend) On Mon, Jul 11, 2016 at 4:45 PM, Saliya Ekanayake wrote: > Hi, > > It seems by default the completed job list only shows 5 entries. Is there

Flink Completed Jobs only has 5 entries

2016-07-11 Thread Saliya Ekanayake
Hi, It seems by default the completed job list only shows 5 entries. Is there a way to increase this? Thank you, saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington

Re: sampling function

2016-07-11 Thread Le Quoc Do
Hi all, Thank you all for your answers. By the way, I also recognized that Flink doesn't support "stratified sampling" function (only simple random sampling) for DataSet. It would be nice if someone can create a Jira for it, and assign the task to me so that I can work for it. Thank you, Do On

Re: The two inputs have different execution contexts.

2016-07-11 Thread Kostas Kloudas
No problem Alieh! Kostas > On Jul 11, 2016, at 11:46 AM, Alieh Saeedi wrote: > > Hi > I was joining two datasets which were from two different > ExecutionEnviornment. It was my mistake. Thanks anyway. > > Best, > Alieh > > > On Monday, 11 July 2016, 11:33, Kostas

Re: The two inputs have different execution contexts.

2016-07-11 Thread Alieh Saeedi
HiI was joining two datasets which were from two different ExecutionEnviornment. It was my mistake. Thanks anyway. Best,Alieh On Monday, 11 July 2016, 11:33, Kostas Kloudas wrote: Hi Alieh, Could you share you code so that we can have a look?From the

Re: sampling function

2016-07-11 Thread Vasiliki Kalavri
Hi Do, Paris and Martha worked on sampling techniques for data streams on Flink last year. If you want to implement your own samplers, you might find Martha's master thesis helpful [1]. -Vasia. [1]: http://kth.diva-portal.org/smash/get/diva2:910695/FULLTEXT01.pdf On 11 July 2016 at 11:31,

Re: The two inputs have different execution contexts.

2016-07-11 Thread Kostas Kloudas
Hi Alieh, Could you share you code so that we can have a look? From the information you provide we cannot help. Thanks, Kostas > On Jul 10, 2016, at 3:13 PM, Alieh Saeedi wrote: > > I can not join or coGroup two tuple2 datasets of the same tome. The error is >

Re: sampling function

2016-07-11 Thread Kostas Kloudas
Hi Do, In DataStream you can always implement your own sampling function, hopefully without too much effort. Adding such functionality it to the API could be a good idea. But given that in sampling there is no “one-size-fits-all” solution (as not every use case needs random sampling and not

dynamic streams and patterns

2016-07-11 Thread Claudia Wegmann
Hey everyone, I'm quite new to Apache Flink. I'm trying to build a system with Flink and wanted to hear your opinion and whether the proposed architecture is even possible with Flink. The environment for the system will be a microservice architecture handling messaging via async events. I

Re: Extract type information from SortedMap

2016-07-11 Thread Timo Walther
Hi Yukun, I think the problem of the input type inference is that SortedMap is a GenericType and not a Flink native type (like Tuple or POJO). This case is not supported at the moment. You can create an issue if you like, maybe there is a way to support this special type inference case.