Re: Need some help to implement the outer join operator

2014-12-21 Thread Chesnay Schepler
Hey Wilson, the MapFunction should act as a wrapper for the join function. create a class extending RichMapFunction, and pass the joinfunction via the constructor. then you delegate open/close calls to it, with the map function looking something like this: map(Tuple2<...> tuple) { return joinFunc

[jira] [Created] (FLINK-1270) FileSystem.get() doesn't support relative paths

2014-11-21 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-1270: --- Summary: FileSystem.get() doesn't support relative paths Key: FLINK-1270 URL: https://issues.apache.org/jira/browse/FLINK-1270 Project: Flink

[jira] [Created] (FLINK-1248) Manually built docu doesn't apply CSS/images

2014-11-17 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-1248: --- Summary: Manually built docu doesn't apply CSS/images Key: FLINK-1248 URL: https://issues.apache.org/jira/browse/FLINK-1248 Project: Flink Issue

[jira] [Created] (FLINK-1244) setCombinable() clunky to use

2014-11-14 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-1244: --- Summary: setCombinable() clunky to use Key: FLINK-1244 URL: https://issues.apache.org/jira/browse/FLINK-1244 Project: Flink Issue Type: Wish

[jira] [Created] (FLINK-1227) KeySelector can't implement ResultTypeQueryable

2014-11-10 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-1227: --- Summary: KeySelector can't implement ResultTypeQueryable Key: FLINK-1227 URL: https://issues.apache.org/jira/browse/FLINK-1227 Project: Flink

Re: [DISCUSS] Policy on keeping layer alternatives in sync

2014-09-27 Thread Chesnay Schepler
I agree with Kostas, and believe that postponing will imo straight up not work since people tend to be *very* busy close to a release, even without having to port features to several APIs. I furthermore don't think we will get anywhere by creating one policy to rule them all (especially a rigi

Re: Python API - Weird Performance Issue

2014-09-10 Thread Chesnay Schepler
? On Wed, Sep 10, 2014 at 2:30 PM, Chesnay Schepler < chesnay.schep...@fu-berlin.de> wrote: only the coordination is done via UDP. i agree with what you say about the loops; currently looking into using FileLocks. On 9.9.2014 11:33, Stephan Ewen wrote: Hey! The UDP version is 25x

Re: Python API - Weird Performance Issue

2014-09-10 Thread Chesnay Schepler
ephan On Mon, Sep 8, 2014 at 4:15 PM, Chesnay Schepler < chesnay.schep...@fu-berlin.de> wrote: sorry for the late answer. today i did a quick hack to replace the synchronization completely with udp. its still synchronous and record based, but 25x slower. regarding busy-loops i would

Exception when running WC

2014-09-09 Thread Chesnay Schepler
Hello, tonight i was running a WordCount job with the Python API, and halfway through i got the exception below. the issue did not occur again after ressubmitting the job. DOP=160 taskslots=8 filesize=100GB org.apache.flink.client.program.ProgramInvocationException: The program execution

Re: Python API - Weird Performance Issue

2014-09-08 Thread Chesnay Schepler
the busy loop? Ufuk On Thu, Aug 28, 2014 at 1:06 AM, Chesnay Schepler < chesnay.schep...@fu-berlin.de> wrote: the performance differences occur on the same system (16GB, 4 cores + HyperThreading) with a DOP of 1 for a plan consisting of a single operator. plenty of resources :/ On 28.8.

Re: Changing how TypeComparators Work

2014-08-28 Thread Chesnay Schepler
esentation, right? On Wed, Aug 27, 2014 at 10:19 PM, Chesnay Schepler wrote: i cant recall definitely what the numbers were so I'll just quote myself from the PR: measurements were done using System.nanoTime() time necessary for the comparison Strings consisted of 90 characters difference in t

Re: Python API - Weird Performance Issue

2014-08-27 Thread Chesnay Schepler
ghput, because the big buffers were not copied (unlike in streams), and the UDP notifications were very fast (fire and forget datagrams). Stephan On Wed, Aug 27, 2014 at 10:48 PM, Chesnay Schepler < chesnay.schep...@fu-berlin.de> wrote: Hey Stephan, I'd like to point out right aw

Re: Python API - Weird Performance Issue

2014-08-27 Thread Chesnay Schepler
, Aug 27, 2014 at 8:34 PM, Chesnay Schepler < chesnay.schep...@fu-berlin.de> wrote: Hello everyone, This will be some kind of brainstorming question. As some of you may know I am currently working on the Python API. The most crucial part here is how the data is exchanged between Java and Python.

Re: Changing how TypeComparators Work

2014-08-27 Thread Chesnay Schepler
i cant recall definitely what the numbers were so I'll just quote myself from the PR: measurements were done using System.nanoTime() time necessary for the comparison Strings consisted of 90 characters difference in the beginning of the string New 4259 Old 23431 difference in the middle of the

Python API - Weird Performance Issue

2014-08-27 Thread Chesnay Schepler
Hello everyone, This will be some kind of brainstorming question. As some of you may know I am currently working on the Python API. The most crucial part here is how the data is exchanged between Java and Python. Up to this point we used pipes for this, but switched recently to memory mapped f

Re: how to split data-sets efficiently?

2014-07-27 Thread Chesnay Schepler
i think this is what martin is currently doing: StringIDs --map-> (StringIDs,LongIDs) --map-> LongIDs and he wants to use both the second and third set. he asks for a way to replace the second map operation. (since it seems unnecessary to create an extra map for that) i believe the appropria

[jira] [Comment Edited] (FLINK-518) Use "file:" as the default scheme

2014-06-19 Thread Chesnay Schepler (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037150#comment-14037150 ] Chesnay Schepler edited comment on FLINK-518 at 6/19/14 9:1

[jira] [Commented] (FLINK-518) Use "file:" as the default scheme

2014-06-19 Thread Chesnay Schepler (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037150#comment-14037150 ] Chesnay Schepler commented on FLINK-518: the file scheme is added if you