Re: MongoOutputFormat does not write back to collection

2015-07-22 Thread Stefano Bortoli
A simple solution would be to: 1 - create a constructor of HadoopOutputFormatBase and HadoopOutputFormat that gets the OutputCommitter as a parameter 2 - change the outputCommitter field of HadoopOutputFormatBase to be a generic OutputCommitter 3 - remove the default assignment in the open() and f

Re: MongoOutputFormat does not write back to collection

2015-07-22 Thread Stephan Ewen
Thank's for reporting this, Stefano! Seems like the HadoopOutputFormat wrapper is pretty much specialized on File Output Formats. Can you open an issue for that? Someone will need to look into this... On Wed, Jul 22, 2015 at 4:48 PM, Stefano Bortoli wrote: > In fact, on close() of the HadoopOu

Re: MongoOutputFormat does not write back to collection

2015-07-22 Thread Stefano Bortoli
In fact, on close() of the HadoopOutputFormat the fileOutputCommitter returns false on if (this.fileOutputCommitter.needsTaskCommit(this.context)) returns false. i/** * commit the task by moving the output file out from the temporary directory. * @throws java.io.IOException */

Re: MongoOutputFormat does not write back to collection

2015-07-22 Thread Stefano Bortoli
Debugging, it seem the commitTask method of the MongoOutputCommitter is never called. Is it possible that this 'bulk' approach of mongo-hadoop 1.4 does not fit the task execution method of Flink? any idea? thanks a lot in advance. saluti, Stefano Stefano Bortoli, PhD *ENS Technical Director *__

Re: Nested Iterations Outlook

2015-07-22 Thread Maximilian Alber
Thanks. Yes, I got that. Cheers On Wed, Jul 22, 2015 at 2:46 PM, Maximilian Michels wrote: > I mentioned that. @Max: you should only try it out if you want to > experiment/work with the changes. > > On Wed, Jul 22, 2015 at 2:20 PM, Stephan Ewen wrote: > >> The two pull requests do not go all t

Re: Nested Iterations Outlook

2015-07-22 Thread Maximilian Michels
I mentioned that. @Max: you should only try it out if you want to experiment/work with the changes. On Wed, Jul 22, 2015 at 2:20 PM, Stephan Ewen wrote: > The two pull requests do not go all the way, unfortunately. They cover > only the runtime, the API integration part is missing still, > unfor

Re: filter as termination condition

2015-07-22 Thread Till Rohrmann
Sachin is right that the filter has to be inverted. Furthermore, the join operation is not right here. You have to do a kind of a left outer join where you only keep the elements which join with NULL. Here is an example of how one could do it [1]. Cheers, Till [1] http://stackoverflow.com/questio

MongoOutputFormat does not write back to collection

2015-07-22 Thread Stefano Bortoli
Hi, I am trying to analyze and update a MongoDB collection with Apache Flink 0.9.0 and Mongo Hadoop 1.4.0 Hadoop 2.6.0. The process is fairly simple, and the MongoInputFormat works smoothly, however it does not write back to the collection. The process works, because the writeAsText works as expe

Re: filter as termination condition

2015-07-22 Thread Sachin Goel
It appears that you're returning true when the previous and current solution are the same. You should instead return false in that case, because this is when the iteration should terminate. Further, instead of joining, it would be a good idea to broadcast the new solution to the old solution [or th

Re: Nested Iterations Outlook

2015-07-22 Thread Stephan Ewen
The two pull requests do not go all the way, unfortunately. They cover only the runtime, the API integration part is missing still, unfortunately... On Mon, Jul 20, 2015 at 5:53 PM, Maximilian Michels wrote: > You could do that but you might run into merge conflicts. Also keep in > mind that it

Re: filter as termination condition

2015-07-22 Thread Stephan Ewen
Termination happens if the "termination criterion" data set is empty. Maybe your filter is too aggressive and filters out everything, or the join is wrong and nothing joins... On Tue, Jul 21, 2015 at 5:05 PM, Pa Rö wrote: > hello, > > i have define a filter for the termination condition by k-me

Re: HDFS directory rename

2015-07-22 Thread Fabian Hueske
listStatus() should return an empty array On Jul 22, 2015 13:11, "Flavio Pompermaier" wrote: > I can detect if it's a dir but how can I detect if it's empty? > > On Wed, Jul 22, 2015 at 12:49 PM, Fabian Hueske wrote: > >> How about FileStatus[] FileSystem.listStatus()? >> FileStatus gives the le

Re: HDFS directory rename

2015-07-22 Thread Flavio Pompermaier
I can detect if it's a dir but how can I detect if it's empty? On Wed, Jul 22, 2015 at 12:49 PM, Fabian Hueske wrote: > How about FileStatus[] FileSystem.listStatus()? > FileStatus gives the length of a file, the path, whether it's a dir, etc. > > 2015-07-22 11:04 GMT+02:00 Flavio Pompermaier :

Re: HDFS directory rename

2015-07-22 Thread Fabian Hueske
How about FileStatus[] FileSystem.listStatus()? FileStatus gives the length of a file, the path, whether it's a dir, etc. 2015-07-22 11:04 GMT+02:00 Flavio Pompermaier : > Ok. What I still not able to do is to recursively remove empty dirs from > the source dir because there's no API for getChild

Re: Multiple ElasticSearch sinks

2015-07-22 Thread Robert Metzger
Hi, I don't know anybody who has reported about something like this before on our lists. Since you don't know the types before, the mapPartition approach sounds good. On Fri, Jul 10, 2015 at 5:02 PM, Flavio Pompermaier wrote: > Hi to all, > > I have a Flink job that produce json objects that I'd

Re: Flink Kafka cannot find org/I0Itec/zkclient/serialize/ZkSerializer

2015-07-22 Thread Stephan Ewen
Here are some comments on Java Classloading: - The Zookeeper code is implicitly loaded by the Kafka code. - When java implicitly loads a class at some point in the program, is uses the classloader of the class of that point in the program. Here it will use the class loader that loaded the Kafka

Re: HDFS directory rename

2015-07-22 Thread Flavio Pompermaier
Ok. What I still not able to do is to recursively remove empty dirs from the source dir because there's no API for getChildrenCount() or getChildren() for a given Path. How can I do that? On Tue, Jul 21, 2015 at 3:13 PM, Stephan Ewen wrote: > I don't think there is a simpler way to do this. > >

Re: Best way to write data to HDFS by Flink

2015-07-22 Thread Robert Metzger
Hey, can you measure how fast jmeter is able to push data into Kafka? Maybe that is already the bottleneck. Flink should be able to read from Kafka with 100k+ elements/second on a single node. On Mon, Jun 29, 2015 at 11:10 AM, Stephan Ewen wrote: > Hi Hawin! > > The performance tuning of Kafka

Re: Flink Kafka cannot find org/I0Itec/zkclient/serialize/ZkSerializer

2015-07-22 Thread Till Rohrmann
Hi Wendong, I don’t think that you have to include flink-streaming-connectors as a dependency. In the 0.9.0-milestone-1 release, all connectors were still bundled in this module. However, with the officiel 0.9.0 release, the streaming connectors were split up in separate modules. Cheers, Till ​

RE: Flink Kafka cannot find org/I0Itec/zkclient/serialize/ZkSerializer

2015-07-22 Thread Hawin Jiang
Hi Wendong Please make sure you have dependencies as below. Good luck * org.apache.flink flink-java 0.9.0