Re: Parallelizing DataStream operations on Array elements

2016-11-04 Thread danielsuo
I was able to resolve my issue by collecting all the 'column' Arrays via countWindowAll and using flatMap to emit 'row' Arrays. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Parallelizing-DataStream-operations-on-Array-elements-tp9911p9917.

Re: Parallelizing DataStream operations on Array elements

2016-11-04 Thread danielsuo
Till Rohrmann wrote > I'm not sure whether I grasp the whole problem, but can't you split > thevector up into the different rows, group by the row index and then > applysome kind of continuous aggregation or window function? So I could flatMap my incoming Arrays into (rowId, arrayElement) and gath

Re: Release Process

2016-11-04 Thread Chesnay Schepler
Hello, Every contribution to the master branch will be released as part of the next minor version, in your case this would be 1.2. We are currently aiming for a release in December. In between minor versions several bug-fix versions are released (1.1.1, 1.1.2 etc.). For these the community pi

Re: window-like use case

2016-11-04 Thread Maciek Próchniak
Hi Aljoscha, I know it's tricky... Few weeks ago we decided to implement it without windows, using just stateful operator and some queues/map per key as state - so yeah, we tried to imagine how to do this in plain java and one stream ;) We also process watermarks to evict old events. Fortuna

Re: Parallelizing DataStream operations on Array elements

2016-11-04 Thread Till Rohrmann
Hi Daniel, I'm not sure whether I grasp the whole problem, but can't you split the vector up into the different rows, group by the row index and then apply some kind of continuous aggregation or window function? Maybe it helps if you can share some of your code with the community to discuss the i

Re: Flink Time Window State

2016-11-04 Thread Aljoscha Krettek
Hi, the state of the window is kept by the WindowOperator (which uses the state descriptor you mentioned to access the state). The FoldFunction does not itself keep the state but is only used to update the state inside the WindowOperator, if you will. When you say restart, are you talking about st

Parallelizing DataStream operations on Array elements

2016-11-04 Thread Daniel Suo
Hello! I have a data source that emits Arrays that I collect into windows via countWindow. Rather than parallelize my subsequent operations by groups of these arrays, I'd like to parallelize my operations across the elements of the array (rows rather than columns, if you will) within each window.

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
Thanks, I didn't know about the -z flag! I haven't been able to get it to work though (using yarn-cluster, with a zookeeper root configured to /flink in my flink-conf.yaml) I can see my job directory in ZK under /flink/application_1477475694024_0015 and I've tried a few ways to restore the job:

Re: Testing DataStreams

2016-11-04 Thread Juan Rodríguez Hortalá
Hi Max, Thanks for your help. Flink-spector looks just like what I need. Greetings, Juan On Thu, Nov 3, 2016 at 11:05 AM, Maximilian Michels wrote: > Hi Juan, > > StreamingMultipleProgramsTestBase is in the testing scope. Thus, is it > not bundled in the normal jars. You would have to add the

Re: Flink Time Window State

2016-11-04 Thread Daniel Santos
Hello Aljoscha, Thank you for your reply. But I believe, reading from the docs, that any user function has to be a Rich Function, if it wishes to have state. Now any Rich Function cannot be used or accepted on a Window. For instances looking at flink source version 1.1.3 which is the one I'm

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Ufuk Celebi
If the configured ZooKeeper paths are still the same, the job should be recovered automatically. On each submission a unique ZK namespace is used based on the app ID. So you have in ZK: /flink/app_id/... You would have to set that manually to resume an old application. You can do this via -z flag

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
Hi Ufuk, I see, but in my case the failure caused YARN application moved into a finished/failed state - so the application itself is no longer running. How can I restart the application (or start a new YARN application) and ensure that it uses the checkpoint pointer stored in Zookeeper? Thanks, J

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Ufuk Celebi
No you don't need to manually trigger a savepoint. With HA checkpoints are persisted externally and store a pointer in ZooKeeper to recover them after a JobManager failure. On Fri, Nov 4, 2016 at 2:27 PM, Josh wrote: > I have a follow up question to this - if I'm running a job in 'yarn-cluster' >

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
I have a follow up question to this - if I'm running a job in 'yarn-cluster' mode with HA and then at some point the YARN application fails due to some hardware failure (i.e. the YARN application moves to "FINISHED"/"FAILED" state), how can I restore the job from the most recent checkpoint? I can

Re: Retrieving a single element from a DataSet

2016-11-04 Thread Greg Hogan
The tickets are in Flink's Jira: https://issues.apache.org/jira/browse/FLINK-4965 https://issues.apache.org/jira/browse/FLINK-4966 Are you looking to process temporal graphs with the DataStream API? On Fri, Nov 4, 2016 at 5:52 AM, otherwise777 wrote: > Cool, thnx for that, > > I tried searc

Re: Flink stream job change and recovery

2016-11-04 Thread Renjie Liu
Hi, Fabian: Will the checkpointed state be restored between streaming jobs? On Fri, Nov 4, 2016 at 6:47 PM Fabian Hueske wrote: > Yes, that is true for Flink 1.1.x. > > The upcoming Flink 1.2.0 release will be able to restore jobs from > savepoints with different parallelism. > > 2016-11-04 11:2

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-04 Thread Josh
Hi Scott & Stephan, The problem has happened a couple more times since yesterday, it's very strange as my job was running fine for over a week before this started happening. I find that if I restart the job (and restore from the last checkpoint) it runs fine for a while (couple of hours) before br

Re: Flink Time Window State

2016-11-04 Thread Aljoscha Krettek
Hi Daniel, Flink will checkpoint the state of all operations (in your case to HDFS). Flink has several APIs for dealing with state in user functions: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/state.html The window operator also internally uses these APIs. Let me know if you n

Re: Flink stream job change and recovery

2016-11-04 Thread Fabian Hueske
Yes, that is true for Flink 1.1.x. The upcoming Flink 1.2.0 release will be able to restore jobs from savepoints with different parallelism. 2016-11-04 11:24 GMT+01:00 Renjie Liu : > Hi, all: > It seems that flink's checkpoint mechanism saves state per partition. > However, if I want to change c

Flink stream job change and recovery

2016-11-04 Thread Renjie Liu
Hi, all: It seems that flink's checkpoint mechanism saves state per partition. However, if I want to change configuration of the job and resubmit the job, I can not recover from last checkpoint, right? Say I change the parallelism of kafka consumer and partitions assigned to each subtask will be di

Flink Kafka Connector behaviour on leader change and broker upgrade

2016-11-04 Thread Janardhan Reddy
HI, Does the flink kafka connector 0.8.2 handle broker's leader change gracefully since simple kafka consumer should be handling leader changes for a partition. How would the consumer behave when upgrading the brokers from 0.8 to 0.9. Thanks

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Maximilian Michels
Hi Anchit, The documentation mentions that you need Zookeeper in addition to setting the application attempts. Zookeeper is needed to retrieve the current leader for the client and to filter out old leaders in case multiple exist (old processes could even stay alive in Yarn). Moreover, it is neede

Re: Retrieving a single element from a DataSet

2016-11-04 Thread otherwise777
Cool, thnx for that, I tried searching for it in teh github but couldn't find it, do you have the url by any chance? I'm going to try to implement such an algorithm for temporal graphs -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Retrievi

Re: Kinesis Connector Dependency Problems

2016-11-04 Thread Robert Metzger
Thank you for helping to investigate the issue. I've filed an issue in our bugtracker: https://issues.apache.org/jira/browse/FLINK-5013 On Wed, Nov 2, 2016 at 10:09 PM, Justin Yan wrote: > Sorry it took me a little while, but I'm happy to report back that it > seems to be working properly with E

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-11-04 Thread Aljoscha Krettek
Hi Pedro, yes, I was more or less suggesting a similar approach to the one taken by King. In code, it would look somewhat like this: DataStream input = ...; DataStream> withMyWindows = input.map(new AssignMyWindows()) withMyWindows .keyBy(...) .window(new DynamicWindowAssigner()) ... where