load balancing groups

2014-10-28 Thread Martin Neumann
I have some problem with load balancing and was wondering how to deal with this kind of problem in Flink. The input I have is a data set of grouped ID's that I join with metadata for each ID. Then I need to compare each Item in a group with each other item in that group and if necessary splitting i

Re: Make Hadoop 2 the default profile

2014-10-22 Thread Martin Neumann
+1 for the change On Tue, Oct 21, 2014 at 10:18 AM, Till Rohrmann wrote: > +1 for the change. > > On Mon, Oct 20, 2014 at 2:42 PM, Robert Metzger > wrote: > > +1 Very good idea. > > > > On Mon, Oct 20, 2014 at 2:37 PM, Márton Balassi < > balassi.mar...@gmail.com> > > wrote: > > > >> +1 on the B

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
gt; > and then run a program like this: > > DataSet data = env.readAsText(...).map(new Parser()); > > data.groupBy("id").sort("someValue").reduceGroup(new > GroupReduceFunction(...)); > > > Feel free to post your program here so we can give you c

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
, 2014 at 3:02 PM, Martin Neumann > wrote: > > I will go with that workaround, however I would have preferred if I could > > have done that directly with the API instead of doing Map/Reduce like > > Key/Value tuples again :-) > > > > By the way is there a simple fun

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
let’s wait for some feedback > > from the others. > > > > Until then you can always map the array to (array, keyfield) and use > > groupBy(1). > > > > > > > On 21 Oct 2014, at 14:17, Martin Neumann wrote: > > > > > > Hej, > > > >

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
as you described. > > Regards, > Gyula > > > On 21 Oct 2014, at 14:03, Martin Neumann wrote: > > > > Hej, > > > > I have a csv file with 54 columns each of them is string (for now). I > need > > to group and sort them on field 15. > > > > W

how load/group with large csv files

2014-10-21 Thread Martin Neumann
Hej, I have a csv file with 54 columns each of them is string (for now). I need to group and sort them on field 15. Whats the best way to load the data into Flink? There is no Tuple54 (and the <> would look awful anyway with 54 times String in it). My current Idea is to write a Mapper and split t

Re: CsvInputFormat delimiter fields

2014-10-15 Thread Martin Neumann
hange that, but would be a bit of work. > > Stephan > > On Wed, Oct 15, 2014 at 3:36 PM, Martin Neumann > wrote: > > > Hej, > > > > A lot of my inputs are csv files so I use the CsvInputFormat a lot. What > I > > find kind of odd that the Line deli

How to group and sort the output of a join

2014-10-15 Thread Martin Neumann
Hej, After a join operation I end up with a Tuple2Tuple2<...>> I now want to group it on field 1 from the first inner Tuple and then sort it by field 1 in the 2nd inner tuple. How do I do that? I think I need to use the key extractor to get the correct field but sort does not take one. cheers Ma

CsvInputFormat delimiter fields

2014-10-15 Thread Martin Neumann
Hej, A lot of my inputs are csv files so I use the CsvInputFormat a lot. What I find kind of odd that the Line delimiter is a String but the Field delimiter is a Character. *see:* new CsvInputFormat>(new Path(pVecPath),"\n",'\t',String.class,String.class) Is there a reason for this? I'm currentl

Re: load broadcast set into searchable set

2014-10-06 Thread Martin Neumann
c); > } > > > > Is that what you had in mind? > > Stephan > > > On Mon, Oct 6, 2014 at 11:57 AM, Martin Neumann > wrote: > > > Hej, > > > > I have a Flink job with with a filter step. I now have a list of > exceptions > > where I need

load broadcast set into searchable set

2014-10-06 Thread Martin Neumann
Hej, I have a Flink job with with a filter step. I now have a list of exceptions where I need to do some extra work (300k data). I thought I just use a boradcast set and then for each like compare if its in the exception set. What is the best way to implement this in Flink? Is there an efficient

grouping and sort grouping with KeySelector

2014-09-22 Thread Martin Neumann
Hej, The data set I'm working on is quite large so I created a pojo class for it to make it less messy. I want do do a simple map, group reduce job. But the groups needs to be sorted on a secondary key. >From what I understand I would get that by doing: dataset.groupBy( key1 ).sortGroup( key2 , O

Re: load avro example

2014-09-22 Thread Martin Neumann
nk-addons/flink-avro/src/test/java/org/apache/flink/api/avro/testjar/AvroExternalJarProgram.java#L225 > > Greetings, > Stephan > > > On Mon, Sep 22, 2014 at 2:16 PM, Martin Neumann > wrote: > > > Hej, > > > > I'm looking for some example code on how to lo

load avro example

2014-09-22 Thread Martin Neumann
Hej, I'm looking for some example code on how to load a avro formatted data set. I have the Avro Schema (its horrible twisted and nested on several levels) and I want to load that into java classes to make it easier to process. I'm using Flink 0.7 latest snapshot. thanks for the help cheers Ma

Re: [VOTE] Style of Flink squirrel logo for website and public accounts

2014-09-18 Thread Martin Neumann
+OUTLINED Colored if that one would be less pink... On Thu, Sep 18, 2014 at 1:52 PM, Till Rohrmann wrote: > +1 for COLORED > > On Thu, Sep 18, 2014 at 1:49 PM, Aljoscha Krettek > wrote: > > > +COLORED (But then the website has to be rather white and simple. :D) > > > > On Thu, Sep 18, 2014 at

Re: switching to flink 0.7

2014-08-27 Thread Martin Neumann
.apache.flink.api.java.functions". > > > > This is the corresponding pull request: > > https://github.com/apache/incubator-flink/pull/85 and I think the change > > was already introduced in 0.6 :) > > > > Regards, > > V. > > > > > > > > On

switching to flink 0.7

2014-08-27 Thread Martin Neumann
Hej, I'm trying to switch from flink 0.6 to flink 0.7 and I'm running some trouble. The functions such as FlatMap seemed to have moved and I cant find the correct place. how do I have to change the following line to get it to work (this worked in 0.6) *import org.apache.flink.api.java.functions.F

Re: Farewell party for Gyula - "Flink" style

2014-08-25 Thread Martin Neumann
In case you move to Stockholm feel free to drop by for a beer :-) On Sat, Aug 23, 2014 at 11:06 AM, Ufuk Celebi wrote: > Very nice! Thanks for sharing :-)) > > On 18 Aug 2014, at 21:34, Gyula Fóra wrote: > > > Thank you Fabian :) It looks promising here already haha. > > > > But no worries I w

how to split data-sets efficiently?

2014-07-27 Thread Martin Neumann
Hej, I have a dataset of StringID's and I want to map them to Longs by using a hash function. I will use the LongID's in a series of Iterative computations and then map back to StringID's. Currently I have a map operation that creates tuples with the string and the long. I have an other mapper cle

broadcast variable in spargel

2014-07-14 Thread Martin Neumann
Hej, I'm using the latest trunk version of Flink and the new JavaAPI. I'm writing some graph algorithms that need to know the number of nodes in the graph. To get the number of nodes I run a short count job and then get a DataSet that I need to give as input to the other calculations. It works fo