Re: Python API - Weird Performance Issue

2014-09-09 Thread Stephan Ewen
Hey! The UDP version is 25x slower? That's massive. Are you sending the records through that as well, or just the coordination? Regarding busy waiting loops: There has to be a better way to do that. It will behave utterly unpredictable. Once the python side does I/O, has a separate process or

Re: flink performance

2014-09-09 Thread Robert Metzger
Hi, can you post the exact error message why HDFS is not working? On Tue, Sep 9, 2014 at 3:50 PM, normanSp wir12...@studserv.uni-leipzig.de wrote: Thank you. But I have already trouble with the shipped example-programs. I can't use files in hdfs. fs.hdfs.hadoopconf option is set. the same

Re: flink performance

2014-09-09 Thread normanSp
of course, sorry. command: ./bin/flink run ./examples/flink-java-examples-0.7-incubating-SNAPSHOT-WordCount.jar hdfs:/input.xml hdfs:/result org.apache.flink.client.program.ProgramInvocationException: The program execution failed: java.io.IOException: The given HDFS file URI (hdfs:/input.xml)

Re: flink performance

2014-09-09 Thread Robert Metzger
I guess in the core-site.xml, the property fs.defaultFS has not been set to the namenode. You basically put the default address there, like: hdfs://master:8037/. On Tue, Sep 9, 2014 at 4:08 PM, Fabian Hueske fhue...@apache.org wrote: Try to specify the HDFS path as follows:

Re: flink performance

2014-09-09 Thread Ufuk Celebi
Hey Norman, I'm not sure but have you tried it with 3 backslashes as in hdfs:///input.xml. Or you can specify the address of the namenode via the config key fs.default.name or fs.defaultFS, i.e.: propertynamefs.default.name /namevaluehdfs://...:port/value/property On Tue, Sep 9, 2014 at 4:03

Re: flink performance

2014-09-09 Thread normanSp
thanks, but i tried all that versions before I posted here. and exactly the same conf-file works in 0.6. @fabian with full hostname and port I get this error: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: java.io.IOException: The given file URI

Re: flink performance

2014-09-09 Thread Fabian Hueske
This error message basically says that Flink's HDFS-client and the running HDFS are not compatible. We have different builds for Hadoop 1.0 and Hadoop 2.0. Please check the version of the running HDFS and choose the corresponding build. Best, Fabian 2014-09-09 16:32 GMT+02:00 Fabian Hueske

Re: flink performance

2014-09-09 Thread normanSp
okay, i thought that is only in yarn-mode necessary. i did it but the next error follows. java.io.IOException: Error opening the Input Split hdfs://MASTER:9000/input.xml [46573551616,134217728]:

Re: flink performance

2014-09-09 Thread Fabian Hueske
This still looks like the versions aren't in sync. What's the exact version of your HDFS (incl. distribution: Hadoop, CDH, etc.) and which Stratosphere build are you using? 2014-09-09 16:57 GMT+02:00 normanSp wir12...@studserv.uni-leipzig.de: okay, i thought that is only in yarn-mode necessary.

Re: Scala API rewrite almost complete

2014-09-09 Thread Kostas Tzoumas
WebLog here: https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala Do you need any more done? On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek aljos...@apache.org wrote: I added the ConnectedComponents Example from Vasia. Keep 'em coming, people. :D On Mon, Sep 8,

Re: Scala API rewrite almost complete

2014-09-09 Thread Kostas Tzoumas
I'll take TransitiveClosure and PiEstimation (was not on your list). If nobody volunteers for the relational stuff I can take those as well. How about removing the RelationalQuery from both Scala and Java? It seems to be a proper subset of TPC-H Q3. Does it add some teaching value on top of

Re: flink performance

2014-09-09 Thread normanSp
i want to use the above mentioned scala-rework-branch https://github.com/aljoscha/incubator-flink/tree/scala-rework. the apache hadoop-version is 2.4.0 and I build Flink with: mvn clean package -DskipTests -Dhadoop.profile=2 -Dhadoop.version=2.4.0 -- View this message in context:

Re: flink performance

2014-09-09 Thread Robert Metzger
Hi, the maven call seems to be correct for your Hadoop version. Can you check if the build contains the hadoop 1.2.1 jar file in the lib/ directory? Ideally, all jars that contain the term hadoop should have version 2.4.0 in their name. Robert On Tue, Sep 9, 2014 at 10:15 PM, normanSp

Re: flink performance

2014-09-09 Thread Ufuk Celebi
Hey Norman, Everything seems to be correct imo. Do the others have any idea? If you like, we can have a short Skype/Hangout trying to pinpoint the problem tomorrow. Should be faster than sending mails back in forth. ;) Afterwards, we would post the result here. On Tue, Sep 9, 2014 at 10:44 PM,

Re: flink performance

2014-09-09 Thread Robert Metzger
Having a quick call is a good idea. I would also be available for it. One thing that also came into my mind (I don't think its the case here): Did you restart flink on the cluster after building the correct version? On Tue, Sep 9, 2014 at 11:34 PM, Ufuk Celebi u...@apache.org wrote: Hey

[jira] [Created] (FLINK-1095) ./flink info -d command is not working for the examples

2014-09-09 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1095: - Summary: ./flink info -d command is not working for the examples Key: FLINK-1095 URL: https://issues.apache.org/jira/browse/FLINK-1095 Project: Flink

Re: 0.6.1 Bugfix Release

2014-09-09 Thread Ufuk Celebi
I've pushed all fixes except for fbed013db60d7c45dcd11b6303ffa16220557e13 https://github.com/apache/incubator-flink/commit/fbed013db60d7c45dcd11b6303ffa16220557e13 as MapPartition is not part of the 0.6 release (and adding it now would a be new feature and not a bugfix) to the 0.6.1-release

Exception when running WC

2014-09-09 Thread Chesnay Schepler
Hello, tonight i was running a WordCount job with the Python API, and halfway through i got the exception below. the issue did not occur again after ressubmitting the job. DOP=160 taskslots=8 filesize=100GB org.apache.flink.client.program.ProgramInvocationException: The program