[jira] [Created] (FLINK-1180) Expose Map and Reduce methods of hadoop via flink interface

2014-10-21 Thread Mohitdeep Singh (JIRA)
Mohitdeep Singh created FLINK-1180: -- Summary: Expose Map and Reduce methods of hadoop via flink interface Key: FLINK-1180 URL: https://issues.apache.org/jira/browse/FLINK-1180 Project: Flink

[RESULT] [VOTE] Release Apache Flink 0.7.0 (incubating) (RC2)

2014-10-21 Thread Robert Metzger
Thanks everybody for voting on this release. Lets hope we don't find a bug in the last minute ... ;) Binding votes: Fabian Hueske Kostas Tzoumas Henry Saputra Stephan Ewen Robert Metzger Ufuk Celebi makes +6 binding votes for the release. I'll post the vote in the incubator once the message app

Re: how load/group with large csv files

2014-10-21 Thread Gyula Fóra
Motivated both by Martin and our recent use-case, I updated the groupBys and aggregations for the Streaming API to work also on arrays by default. I think it would probably make sense to do something simi

Re: [VOTE] Release Apache Flink 0.7.0 (incubating) (RC2)

2014-10-21 Thread Ufuk Celebi
+1 Tested on OS X. On 21 Oct 2014, at 19:33, Robert Metzger wrote: > +1 > > I tested the candidate on our cluster again. > > -- Robert (from my mobile) > >> On 21.10.2014, at 18:05, Stephan Ewen wrote: >> >> verified source and bin NOTICE and LICENSE >> ran examples on cluster >> ran examp

Re: [VOTE] Release Apache Flink 0.7.0 (incubating) (RC2)

2014-10-21 Thread Robert Metzger
+1 I tested the candidate on our cluster again. -- Robert (from my mobile) > On 21.10.2014, at 18:05, Stephan Ewen wrote: > > verified source and bin NOTICE and LICENSE > ran examples on cluster > ran examples in local mode > > +1 > > On Tue, Oct 21, 2014 at 5:31 PM, Henry Saputra > wrote:

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
There was not enough time to clean it up and gold plate it. He got semi horrible java code now with some explanation how it would look in scala. My colleague was asking for a quick (and dirty) job, so taking more time on it would have defied the purpose of the whole thing a bit. In any case thanks

Re: [VOTE] Release Apache Flink 0.7.0 (incubating) (RC2)

2014-10-21 Thread Stephan Ewen
verified source and bin NOTICE and LICENSE ran examples on cluster ran examples in local mode +1 On Tue, Oct 21, 2014 at 5:31 PM, Henry Saputra wrote: > Signature files look good > Checksum files look good > NOTICE and LICENSE files look good > No 3rd party executables in source artifact > Sour

Re: [VOTE] Release Apache Flink 0.7.0 (incubating) (RC2)

2014-10-21 Thread Henry Saputra
Signature files look good Checksum files look good NOTICE and LICENSE files look good No 3rd party executables in source artifact Source compile and tests passed Run simple example in standalone. +1 - Henry On Sat, Oct 18, 2014 at 1:38 PM, Robert Metzger wrote: > Please vote on releasing the f

Re: how load/group with large csv files

2014-10-21 Thread Stephan Ewen
Hej, Do you want to use Scala? You can use simple case classes there and use fields directly as keys, it will look very elegant... If you want to stick with Java, you can actually use POJOs (Robert just corrected me, expression keys should be available there) Can you define a class public class

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
Nope, but I cant filter out the useless data since the program I'm comparing to does not either. The point is to prove to one of my Colleague that Flink > Spark. The Spark program runs out of memory and crashes when just doing a simple group and counting the number of items. This is also one of t

Re: how load/group with large csv files

2014-10-21 Thread Stephan Ewen
The POJO support should allow you to have a custom type with such many fields, and then point to the relevant sorting fields. Unfortunately, the pojo expression keys are not available in group sorting as of today. Next version will solve it more elegantly... On Tue, Oct 21, 2014 at 3:07 PM, Aljos

Re: how load/group with large csv files

2014-10-21 Thread Aljoscha Krettek
By the way, do you actually need all those 54 columns in your job? On Tue, Oct 21, 2014 at 3:02 PM, Martin Neumann wrote: > I will go with that workaround, however I would have preferred if I could > have done that directly with the API instead of doing Map/Reduce like > Key/Value tuples again :-

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
I will go with that workaround, however I would have preferred if I could have done that directly with the API instead of doing Map/Reduce like Key/Value tuples again :-) By the way is there a simple function to count the number of items in a reduce group? It feels stupid to write a GroupReduce th

Re: how load/group with large csv files

2014-10-21 Thread Robert Metzger
Yes, for sorted groups, you need to use Pojos or Tuples. I think you have to split the input lines manually, with a mapper. How about using a TupleN<...> with only the fields you need? (returned by the mapper) if you need all fields, you could also use a Tuple2 where the first position is the sort

Re: [VOTE] Release Apache Flink 0.7.0 (incubating) (RC2)

2014-10-21 Thread Kostas Tzoumas
+1 On Mon, Oct 20, 2014 at 1:27 PM, Fabian Hueske wrote: > +1 > > - Checked all signatures and hashes > - Built from source archive (mvn clean install) > - Ran all examples with provided data locally on previous build > > 2014-10-18 22:38 GMT+02:00 Robert Metzger : > > > Please vote on releasi

Re: how load/group with large csv files

2014-10-21 Thread Gyula Fora
I am not sure how you should go about that, let’s wait for some feedback from the others. Until then you can always map the array to (array, keyfield) and use groupBy(1). > On 21 Oct 2014, at 14:17, Martin Neumann wrote: > > Hej, > > Unfortunately .sort() cannot take a key extractor, would

Re: how load/group with large csv files

2014-10-21 Thread Martin Neumann
Hej, Unfortunately .sort() cannot take a key extractor, would I have to do the sort myself then? cheers Martin On Tue, Oct 21, 2014 at 2:08 PM, Gyula Fora wrote: > Hey, > > Using arrays is probably a convenient way to do so. > > I think the way you described the groupBy only works for tuples n

Re: how load/group with large csv files

2014-10-21 Thread Gyula Fora
Hey, Using arrays is probably a convenient way to do so. I think the way you described the groupBy only works for tuples now. To do the grouping on the array field, you would need to create a key extractor for this and pass that to groupBy. Actually we have some use-cases like this for streami

how load/group with large csv files

2014-10-21 Thread Martin Neumann
Hej, I have a csv file with 54 columns each of them is string (for now). I need to group and sort them on field 15. Whats the best way to load the data into Flink? There is no Tuple54 (and the <> would look awful anyway with 54 times String in it). My current Idea is to write a Mapper and split t

[jira] [Created] (FLINK-1179) Add button to JobManager web interface to request stack trace of a TaskManager

2014-10-21 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1179: - Summary: Add button to JobManager web interface to request stack trace of a TaskManager Key: FLINK-1179 URL: https://issues.apache.org/jira/browse/FLINK-1179 Projec

Re: Make Hadoop 2 the default profile

2014-10-21 Thread Till Rohrmann
+1 for the change. On Mon, Oct 20, 2014 at 2:42 PM, Robert Metzger wrote: > +1 Very good idea. > > On Mon, Oct 20, 2014 at 2:37 PM, Márton Balassi > wrote: > >> +1 on the Budapest side. We're using Flink co-located with a HDFS2 cluster. >> >> On Mon, Oct 20, 2014 at 2:15 PM, Stephan Ewen wrote: