Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Maximilian Michels
Hi! This looks neat. Let's try it out. I just voted. Cheers, Max On Sun, Sep 4, 2016 at 8:11 PM, Vishnu Viswanath wrote: > Hi All, > > Why don't we make use of Stackoverflow's new documentation feature to do > some documentation of Apache Flink. > > To start, at least 5 SO users should commit t

Re: Flink Kafka more consumers than partitions

2016-09-05 Thread Maximilian Michels
Thanks for letting us know! On Sat, Sep 3, 2016 at 12:42 PM, neo21 zerro wrote: > Hi all, > > It turns out that there were other factors influencing my performance tests. > (actually hbase) > Hence, more consumers than partitions in Flink was not the problem. > Thanks for the help! > > > On Wedne

Re: fromParallelCollection

2016-09-05 Thread Maximilian Michels
Please give us a bit more insight on what you're trying to do. On Sat, Sep 3, 2016 at 5:01 AM, wrote: > Hi, > val env = StreamExecutionEnvironment.getExecutionEnvironment > val tr = env.fromParallelCollection(data) > > the data i do not know initialize,some one can tell me.. >

Re: flink dataStream operate dataSet

2016-09-05 Thread Aljoscha Krettek
Hi, right now it is not possible to mix the DataSet and the DataStream API. The reason for the "task not serializable" error is that putting the DataSet into the map function tries to serialize the DataSet, which is not possible. Cheers, Aljoscha On Tue, 30 Aug 2016 at 16:31 wrote: > Hi, >

Re: Apache Flink: How does it handle the backpressure?

2016-09-05 Thread Aljoscha Krettek
That's true. The reason why it works in Flink is that a slow downstream operator will back pressure an upstream operator which will then slow down. The technical implementation of this relies on the fact that Flink uses a bounded pool of network buffers. A sending operator writes data to network bu

Re: checkpoints not removed on hdfs.

2016-09-05 Thread Aljoscha Krettek
Hi, which version of Flink are you using? Are the checkpoints being reported as successful in the Web Frontend, i.e. in the "checkpoints" tab of the running job? Cheers, Aljoscha On Fri, 2 Sep 2016 at 12:17 Dong-iL, Kim wrote: > Hi, > > I’m using HDFS as state backend. > The checkpoints folder

Re: Remote upload and execute

2016-09-05 Thread Aljoscha Krettek
+Max Michels Directly looping in Max. You recently worked on the clients code, do you have any Idea if and how this is possible? On Fri, 2 Sep 2016 at 14:38 Paul Wilson wrote: > Hi, > > I'd like to write a client that can execute an already 'uploaded' JAR > (i.e. the JAR is deployed and availab

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Ravikumar Hawaldar
Hi, I just committed to apache-flink documentation on SO, one more commit required. Nice idea to document on SO Vishnu. Regards, Ravikumar On 5 September 2016 at 14:22, Maximilian Michels wrote: > Hi! > > This looks neat. Let's try it out. I just voted. > > Cheers, > Max > > On Sun, Sep 4,

Re: Remote upload and execute

2016-09-05 Thread Maximilian Michels
The jar file always needs to be available. Whether this is locally on your machine or in the jar file directory of the web interface which runs on the JobManager. Without the file we can't generate a JobGraph which is Flink's representation of a compiled program ready for execution. Additionally, w

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Fabian Hueske
Thanks for the suggestion Vishnu! Stackoverflow documentation looks great. I like the easy contribution and versioning features. However, I am a bit skeptical. IMO, Flink's primary documentation must be hosted by Apache. Out-sourcing such an important aspect of a project to an external service is

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Maximilian Michels
I thought it is not about outsourcing but about providing an example-based documentation on SO which can be easily edited by the SO community. The work can be fed back to the official Flink documentation which will always be on flink.apache.org. On Mon, Sep 5, 2016 at 12:42 PM, Fabian Hueske wrot

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Till Rohrmann
I've understood the SO documentation approach similar to what Max has said. I see it as source of code examples which illustrate Flink concepts and which is maintained by the SO community. On Mon, Sep 5, 2016 at 1:09 PM, Maximilian Michels wrote: > I thought it is not about outsourcing but about

Re: emit a single Map per window

2016-09-05 Thread Aljoscha Krettek
Hi, for this you would have to use a non-parallel window, i.e. something like stream.windowAll().apply(...). This does not compute per key but has the drawback that computation does not happen in parallel. If you only use it to combine the pre-aggregated maps it could be OK, though. Cheers, Aljosc

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Matthias J. Sax
I voted. It's live now. The advance of SO documentation is also, that people not familiar with Apache might do some documentation (but would never open a PR). Of course, as community, we should put the focus on web page docs. But having something additional can't hurt. From my experience, it is a

fsbackend with nfs

2016-09-05 Thread CPC
Hi, Is it possible to use flinkstatebackend with nfs? We dont want to deploy hadoop in our environment and we want to attach same nfs volume to mount every task and job manager. Thanks...

AW: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Bauss, Julian
Hi Stephan, thanks for your reply! It seems as if I can only use broadcast variables on DataSet-Operators (using myFunc.withBroadcastSet(…)) Is that right? I am working on a DataStream, though. Do streams offer similiar functionality? Best Regards, Julian Von: Stephan Ewen [mailto:se...@apach

Re: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Simone Robutti
I think you could make use of this small component I've developed: https://gitlab.com/chobeat/Flink-JPMML It's specifically for using JPMML on Flink. Maybe there's too much stuff for what you need but you could reuse the code of the Operator to do what you need. 2016-09-05 14:11 GMT+02:00 Bauss,

AW: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Bauss, Julian
Hi Simone, that sounds promising! Unfortunately your link leads to a 404 page. Best Regards, Julian Von: Simone Robutti [mailto:simone.robu...@radicalbit.io] Gesendet: Montag, 5. September 2016 14:59 An: user@flink.apache.org Betreff: Re: Storing JPMML-Model Object as a Variable Closure? I thi

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Robert Metzger
It seems that the content on SO is licensed under cc by-sa 3.0 with attribution required The Apache Legal FAQ is not completely clear about that case http://www.apache.org/legal/resolved.html#cc-sa But if we want, we could at least ask the legal PMC if we can add some of the content from SO into t

Re: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Simone Robutti
Yes, sorry but it's private and I just discovered we don't want to release it as public yet. This piece of code could help you though: https://gist.github.com/chobeat/f07221357a2e3f9efa377e4cb0479f92 Ignore all the stuff about the strategies. The important part is the `open` method and the transie

RawSchema as deserialization schema

2016-09-05 Thread Swapnil Chougule
I am using Kafka consumer in flink 1.1.1 with Kafka 0.8.2. I want to read byte array as it is in datastream. I tried to use RawSchema as desrialization schema but couldn't find same 1.1.1. I want to know whether I have to write my custom implementation for same ? Can somebody help me to sort out s

Re: Flink Iterations vs. While loop

2016-09-05 Thread Theodore Vasiloudis
Hello Dan, are you broadcasting the 85GB of data then? I don't get why you wouldn't store that file on HDFS so it's accessible by your workers. If you have the full code available somewhere we might be able to help better. For L-BFGS you should only be broadcasting the model (i.e. the weight ve

Re: Firing windows multiple times

2016-09-05 Thread Aljoscha Krettek
I forgot to mention the FLIP that would basically provide the functionality that we need (without handling of late elements): https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata. I just need to find some time to implement this or find someone who would be wil

AW: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Bauss, Julian
Hi Simone, thank you for your feedback! The code snippet you provided works fine. The only drawback is that the Evaluator gets initialized once per Function-Instance. That could lead to high memory consumption depending on the level of parallelism and the size of the PMML-Model (which can get qu

Re: Storing JPMML-Model Object as a Variable Closure?

2016-09-05 Thread Simone Robutti
>The only drawback is that the Evaluator gets initialized once per Function-Instance. I believe that reducing the number of functions instances is something that should be handled by Flink's runtime and that's why I've implemented the solution this way. In our tests the number of instances was min

Re: Kafka SimpleStringConsumer NPE

2016-09-05 Thread Maximilian Michels
Your Kafka topic seems to contain null values. By default, Flink will just forward null values to the DeserializationSchema which has to take care of null values. The SimpleStringSchema doesn't do that and fails with a NullPointerException. Thus, you need an additional check in your Deserialization

Re: RawSchema as deserialization schema

2016-09-05 Thread Maximilian Michels
Just implement DeserializationSchema and return the byte array from Kafka. Byte array serialization poses no problem to the Flink serialization. On Mon, Sep 5, 2016 at 3:50 PM, Swapnil Chougule wrote: > I am using Kafka consumer in flink 1.1.1 with Kafka 0.8.2. I want to read > byte array as it i

Re: emit a single Map per window

2016-09-05 Thread Luis Mariano Guerra
On Mon, Sep 5, 2016 at 12:30 PM, Aljoscha Krettek wrote: > Hi, > for this you would have to use a non-parallel window, i.e. something like > stream.windowAll().apply(...). This does not compute per key but > has the drawback that computation does not happen in parallel. If you only > use it to co

Sharing Java Collections within Flink Cluster

2016-09-05 Thread Chakravarthy varaga
Hi Team, I'm working on a Flink Streaming application. The data is injected through Kafka connectors. The payload volume is roughly 100K/sec. The event payload is a string. Let's call this as DataStream1. This application also uses another DataStream, call it DataStream2, (consumes events off

Re: NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-05 Thread Steffen Hausmann
Thanks Aris for your explanation! A guava version mismatch was indeed the problem. But in addition to shading the guava dependencies, I encountered another issue caused by conflicting files in META-INF/services: RemoteTransportException[[Failed to deserialize response of type [org.elasticse

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Vishnu Viswanath
Hi, This is my understanding of SO Documentation: 1. Any one in the community can contribute, others can validate and improve 2. Users can request a topic 3. As Max mentioned, the documentation is more example oriented and it doesn't have to be as comprehensive as the Flink's primary documentatio