Re: Scala API rewrite almost complete

2014-09-28 Thread Stephan Ewen
I think Aljoscha's suspicion is correct. Eclipse let's you also reference test code from main code, so it seems not to separate code. An extra scala-tests project is a good idea. A workaround: call maven install on the Shell and close the scala projects in Eclipse. The flink-tests project will the

Re: Scala API rewrite almost complete

2014-09-28 Thread Aljoscha Krettek
Hi Robert, this might be a problem with Eclipse not having a strict separation between compiling the src/main and src/test code. The code that generates the TypeInformation is a macro. Macros are only usable if the code that uses them is compiled in a separate compilation step from the compilation

Re: Scala API rewrite almost complete

2014-09-28 Thread Henry Saputra
Hi Robert, I didn't have problem with IntelliJ idea last week, will try it again with latest master. User can deactivate unwanted maven profile via idea's maven project option (not remember exact name of that feature) On Sunday, September 28, 2014, Robert Metzger wrote: > I've worked on a quit

Re: Scala API rewrite almost complete

2014-09-28 Thread Robert Metzger
I've worked on a quite outdated version of Flink for a while now and rebased my code to the latest master on Friday. Back at home, I wanted to continue my work and found that it is very difficult to properly set up the latest eclipse for Flink. What I've done so far: - Downloaded Eclipse Luna SR1

Re: Scala API rewrite almost complete

2014-09-14 Thread Márton Balassi
Answer posted to "Example packages naming convention" thread as the issue diverged from this topic. On Sun, Sep 14, 2014 at 11:14 AM, Kostas Tzoumas wrote: > Good catch, I suggest to use examples > > On Sat, Sep 13, 2014 at 3:27 PM, Márton Balassi > wrote: > > > Pull request issued. One minor n

Re: Scala API rewrite almost complete

2014-09-14 Thread Kostas Tzoumas
Good catch, I suggest to use examples On Sat, Sep 13, 2014 at 3:27 PM, Márton Balassi wrote: > Pull request issued. One minor naming concern: > > As of today the scala examples are located at > the org.apache.flink.examples.scala package, while the java ones in > the org.apache.flink.example.jav

Re: Scala API rewrite almost complete

2014-09-13 Thread Márton Balassi
Pull request issued. One minor naming concern: As of today the scala examples are located at the org.apache.flink.examples.scala package, while the java ones in the org.apache.flink.example.java. I suggest using only one convention for this either example or examples. Cheers, Marton On Fri, Sep

Re: Scala API rewrite almost complete

2014-09-12 Thread Márton Balassi
Sorry for being a bit silent after already bidding on LR. The pull request is coming soon. On Fri, Sep 12, 2014 at 6:25 PM, Stephan Ewen wrote: > I suppose that having the option between simple return type, and a > collector is the easiest to understand. > Am 12.09.2014 16:50 schrieb "Aljoscha

Re: Scala API rewrite almost complete

2014-09-12 Thread Stephan Ewen
I suppose that having the option between simple return type, and a collector is the easiest to understand. Am 12.09.2014 16:50 schrieb "Aljoscha Krettek" : > So, should I change join and coGroup to have a simple return value, no > Option or Collection? Also what's happening with the relational >

Re: Scala API rewrite almost complete

2014-09-12 Thread Aljoscha Krettek
So, should I change join and coGroup to have a simple return value, no Option or Collection? Also what's happening with the relational examples and the LinearRegression examples? I'd like to make a pull request before this weekend. I also added a test that checks whether the Scala API has the same

Re: Scala API rewrite almost complete

2014-09-12 Thread Aljoscha Krettek
Yes, there is already a Collector version, you can do: left.join(right).where("foo").equalTo("bar") { (left, right, out: Collector[Page]) => if (...) out.collect(...) } I wasn't sure on what our Function2 variant should be. That's why I asked. There are some cases where you want to have the

Re: Scala API rewrite almost complete

2014-09-12 Thread Stephan Ewen
I think it seems weird that normal joins need to go through option. The option variant is to allow filters in the join function. Wouldn't a collector variant allow you to do the same, and would be function3 ? I know that option reads more functionally... Am 12.09.2014 14:24 schrieb "Aljoscha Kr

Re: Scala API rewrite almost complete

2014-09-12 Thread Aljoscha Krettek
As already mentioned this is not possible because of type erasure. We can only have one join variant that takes a Function2. On Fri, Sep 12, 2014 at 12:34 PM, Stephan Ewen wrote: > It would be nice to have a join variant that directly returns the value > rathern than an option. Why not have both

Re: Scala API rewrite almost complete

2014-09-12 Thread Stephan Ewen
It would be nice to have a join variant that directly returns the value rathern than an option. Why not have both (they are wrapped as flatJoins anyway below, right?) On Fri, Sep 12, 2014 at 11:50 AM, Fabian Hueske wrote: > Sweet! I'm lovin' this :-) > > 2014-09-12 11:46 GMT+02:00 Aljoscha Krett

Re: Scala API rewrite almost complete

2014-09-12 Thread Fabian Hueske
Sweet! I'm lovin' this :-) 2014-09-12 11:46 GMT+02:00 Aljoscha Krettek : > Also, you can use CaseClasses directly as the type for CSV input. So > instead of reading it as tuples and then having a mapper that maps to > your case classes you can use: > > env.readCsv[Edge](...) > > On Fri, Sep 12, 2

Re: Scala API rewrite almost complete

2014-09-12 Thread Aljoscha Krettek
Also, you can use CaseClasses directly as the type for CSV input. So instead of reading it as tuples and then having a mapper that maps to your case classes you can use: env.readCsv[Edge](...) On Fri, Sep 12, 2014 at 11:43 AM, Aljoscha Krettek wrote: > I added support for specifying keys by name

Re: Scala API rewrite almost complete

2014-09-12 Thread Aljoscha Krettek
I added support for specifying keys by name for CaseClasses. Check out the PageRank and TriangleEnumeration examples to see it in action. @Kostas: I think you could use them for the TPC-H examples. On Fri, Sep 12, 2014 at 7:23 AM, Aljoscha Krettek wrote: > Yes, that would allow list comprehensio

Re: Scala API rewrite almost complete

2014-09-11 Thread Aljoscha Krettek
Yes, that would allow list comprehensions. It would be possible to have the Collection signature for join (and coGroup), i.e.: apply[R]((T, O) => TraversableOnce[O]): DataSet[O] (T and O are the left and right input type, R is result type) Then you can return collections and still return an opti

Re: Scala API rewrite almost complete

2014-09-11 Thread Fabian Hueske
Hmmm, tricky question... How about the Option for Join as this is a tuple-wise operation and the Collection for Cogroup which is group-wise? Could we in that case use list comprehensions in Cogroup functions? Or is that too much mixing? 2014-09-11 23:00 GMT+02:00 Aljoscha Krettek : > I didn't lo

Re: Scala API rewrite almost complete

2014-09-11 Thread Aljoscha Krettek
I didn't look at the example either. Addings collections is easy, it's just that we can either have Collections or the Option, not both. For the coding style I followed this: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide, which itself is based on this: http://docs.scala

Re: Scala API rewrite almost complete

2014-09-11 Thread Fabian Hueske
I haven't looked at the LineRank example in detail, but if you think that it adds something new to the examples collection, we can certainly port it also to Java. I think the Option and Collector return types are sufficient right now but if Collections are easy to add, go for it. ;-) Great that th

Re: Scala API rewrite almost complete

2014-09-11 Thread Aljoscha Krettek
What about the LineRank example? We had that in Scala but never had a Java Example. On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek wrote: > Yes, I like that. For the ITCases I always just copied the Java ITCase. > > The only examples that are missing now are LinearRegression and the > relation

Re: Scala API rewrite almost complete

2014-09-11 Thread Aljoscha Krettek
Yes, I like that. For the ITCases I always just copied the Java ITCase. The only examples that are missing now are LinearRegression and the relational stuff. On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske wrote: > I just removed the old CountEdgeDegrees example. > That was a preprocessing step f

Re: Scala API rewrite almost complete

2014-09-11 Thread Kostas Tzoumas
I will port PiEstimation now that generateSequence is in, as well as TPC-H Q3 Kostas On Thu, Sep 11, 2014 at 5:40 PM, Aljoscha Krettek wrote: > I added the PageRank example, thanks again fabian. :D > > Regarding the other stuff: > - There is a comment in DataSet.scala about including > org.apa

Re: Scala API rewrite almost complete

2014-09-11 Thread Fabian Hueske
I just removed the old CountEdgeDegrees example. That was a preprocessing step for the TriangleEnumeration, and is now part of the new TriangleEnumerationOpt example. So I guess, we don't need to port that one. As I said before, I'd prefer to keep Java and Scala examples in sync. Cheers, Fabian 2

Re: Scala API rewrite almost complete

2014-09-11 Thread Aljoscha Krettek
I added the PageRank example, thanks again fabian. :D Regarding the other stuff: - There is a comment in DataSet.scala about including org.apache.flink.api.scala._ because of the TypeInformation. - I added generateSequence to ExecutionEnvironment. - It is possible to use Scala Primitives in Arr

Re: Scala API rewrite almost complete

2014-09-11 Thread Stephan Ewen
+1 for removing RelationQuery On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek wrote: > By the way, what was called BatchGradientDescent in the Scala examples > should be replaced by a port of the LinearRegression Example from > Java. I had them as two separate examples earlier. > > What about

Re: Scala API rewrite almost complete

2014-09-11 Thread Fabian Hueske
+1 for removing RelationalQuery IMO, the Scala examples should mirror the Java examples. So, we should rather port Java examples to Scala instead of updating existing Scala examples. I am also done with the PageRank implementation. Final tests are currently running and I'll open a PR soon. I foun

Re: Scala API rewrite almost complete

2014-09-11 Thread Aljoscha Krettek
By the way, what was called BatchGradientDescent in the Scala examples should be replaced by a port of the LinearRegression Example from Java. I had them as two separate examples earlier. What about RelationalQuery and TPC-H-Q3. Any thoughts about removing RelationalQuery? On Thu, Sep 11, 2014 at

Re: Scala API rewrite almost complete

2014-09-11 Thread Aljoscha Krettek
I added the Triangle Enumeration Examples, thanks Fabian. So far we have ported: WordCount, KMeans, ConnectedComponents, WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt These are the examples people called dibs on: - PageRank (Fabian) - BatchGradientDescent (Márton) - Comp

Re: Scala API rewrite almost complete

2014-09-10 Thread Aljoscha Krettek
Thanks, I added it. I'll keep a running list of ported/unported examples in my mails. I'll rename the java example package to examples once the Scala API merge is done. I think the termination criterion is fine as it is. Just because Scala enables functional programming doesn't mean it's always th

Re: Scala API rewrite almost complete

2014-09-09 Thread Kostas Tzoumas
I'll take TransitiveClosure and PiEstimation (was not on your list). If nobody volunteers for the relational stuff I can take those as well. How about removing the "RelationalQuery" from both Scala and Java? It seems to be a proper subset of TPC-H Q3. Does it add some teaching value on top of TPC

Re: Scala API rewrite almost complete

2014-09-09 Thread Aljoscha Krettek
Thanks, I added it, along with an ITCase. So far we have ported: WordCount, KMeans, ConnectedComponents, WebLogAnalysis These are the examples people called dibs on: - TriangleEnumration and PageRank (Fabian) - BatchGradientDescent (Márton) - ComputeEdgeDegrees (Hermann) Those are unclaimed (

Re: Scala API rewrite almost complete

2014-09-09 Thread Kostas Tzoumas
WebLog here: https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala Do you need any more done? On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek wrote: > I added the ConnectedComponents Example from Vasia. > > Keep 'em coming, people. :D > > On Mon, Sep 8, 2014 at 6:07 PM,

Re: Scala API rewrite almost complete

2014-09-09 Thread Aljoscha Krettek
I added the ConnectedComponents Example from Vasia. Keep 'em coming, people. :D On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske wrote: > Alright, will do. > Thanks! > > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek : > >> Ok people, executive decision. :D >> >> Please look at KMeansData.java and KMe

Re: Scala API rewrite almost complete

2014-09-08 Thread Fabian Hueske
Alright, will do. Thanks! 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek : > Ok people, executive decision. :D > > Please look at KMeansData.java and KMeans.scala. I'm storing the data > in multi-dimensional object arrays and then converting it to the > required Java or Scala objects. > > Also, I ch

Re: Scala API rewrite almost complete

2014-09-08 Thread Aljoscha Krettek
Ok people, executive decision. :D Please look at KMeansData.java and KMeans.scala. I'm storing the data in multi-dimensional object arrays and then converting it to the required Java or Scala objects. Also, I changed isEqualTo to equalTo to make it consistent with the Java API. Regarding Join (a

Re: Scala API rewrite almost complete

2014-09-08 Thread Fabian Hueske
Aside from the DataSet issue, I also found an inconsistency with the Java API. In Java join is done as: ds1.join(ds2).where(...).equalTo(...) where in the current Scala this is: ds1.join(d2).where(...).isEqualTo(...) isEqualTo() should be renamed to equalTo(), IMO. Also, join (+cross and coGrou

Re: Scala API rewrite almost complete

2014-09-08 Thread Stephan Ewen
Instead of Strings, Object[][] would work as well. That is a generic representation of a Tuple. Alternatively, they could be stored as Java or Scala Tuples, with a generic utility method to convert between the two. On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske wrote: > Yeah, I ran into the sam

Re: Scala API rewrite almost complete

2014-09-08 Thread Fabian Hueske
Yeah, I ran into the same problem... +1 for using Strings and parsing them, but using the CSVFormat won't work because this is based on a FileInputFormat. So we would need to parse the Strings manually... 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek : > Hi, > on second thought. Maybe we should j

Re: Scala API rewrite almost complete

2014-09-08 Thread Márton Balassi
+1: If we opted for that we could easily use the same input for streaming as well - we've been facing the same issue recently. On Mon, Sep 8, 2014 at 10:35 AM, Aljoscha Krettek wrote: > Hi, > on second thought. Maybe we should just change all the example input > data to strings and use CSV input

Re: Scala API rewrite almost complete

2014-09-08 Thread Aljoscha Krettek
Hi, on second thought. Maybe we should just change all the example input data to strings and use CSV input formats in all the examples. What do you think? Cheers, Aljoscha On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek wrote: > Hi, > yes it's unfortunate that the data types are incompatible. I

Re: Scala API rewrite almost complete

2014-09-07 Thread Aljoscha Krettek
Hi, yes it's unfortunate that the data types are incompatible. I'm afraid you have to to what you proposed: move the data to a static field and convert it in the getDefaultEdgeDataSet() method in Scala. It's not nice, but copying would duplicate the data and make it easier for it to go out of sync

Re: Scala API rewrite almost complete

2014-09-07 Thread Vasiliki Kalavri
Hey, I have ported the Connected Components example, but I am not sure how to reuse the example input data from java-examples. In the ConnectedComponentsData class, the vertices and edges data are produced by the methods getDefaultVertexDataSet() and getDefaultEdgeDataSet(), which take an org.apac

Re: Scala API rewrite almost complete

2014-09-05 Thread Aljoscha Krettek
Alright, I updated my repo: https://github.com/aljoscha/incubator-flink/commits/scala-rework This now has a working WordCount example. It's pretty much a copy of the Java example with some fixups for the syntax and lambda functions. You'll also notice that I added the java-examples as a dependency

Re: Scala API rewrite almost complete

2014-09-05 Thread Hermann Gábor
+1 ComputeEdgeDegrees for me! On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi wrote: > +1 > > BatchGradientDescent for me :) > > > On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas > wrote: > > > +1 > > > > I go for WebLogAnalysis. > > > > My experience with Scala consists of going through a tu

Re: Scala API rewrite almost complete

2014-09-05 Thread Márton Balassi
+1 BatchGradientDescent for me :) On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas wrote: > +1 > > I go for WebLogAnalysis. > > My experience with Scala consists of going through a tutorial so this will > be a good stress test both for me and the new API :-) > > > On Thu, Sep 4, 2014 at 9:09 PM

Re: Scala API rewrite almost complete

2014-09-05 Thread Kostas Tzoumas
+1 I go for WebLogAnalysis. My experience with Scala consists of going through a tutorial so this will be a good stress test both for me and the new API :-) On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri wrote: > +1 for having other people implement the examples! > Connected Components and

Re: Scala API rewrite almost complete

2014-09-04 Thread Vasiliki Kalavri
+1 for having other people implement the examples! Connected Components and Kmeans for me :) -V. On 4 September 2014 21:03, Fabian Hueske wrote: > I go for TriangleEnumeration and PageRank. > > Let's also do the examples similar to the Java examples: > - running out-of-the-box without paramete

Re: Scala API rewrite almost complete

2014-09-04 Thread Fabian Hueske
I go for TriangleEnumeration and PageRank. Let's also do the examples similar to the Java examples: - running out-of-the-box without parameters - parameters for external data - follow a similar code structure 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek : > Will do, then people can reserve thei

Re: Scala API rewrite almost complete

2014-09-04 Thread Aljoscha Krettek
Will do, then people can reserve their favourite examples here. On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske wrote: > Hi, > > I think having examples implemented by different people proved to be > valuable in the past. > I'd help with two or three examples. > > It might be helpful if you'd port

Re: Scala API rewrite almost complete

2014-09-04 Thread Fabian Hueske
Hi, I think having examples implemented by different people proved to be valuable in the past. I'd help with two or three examples. It might be helpful if you'd port a simple first one such as WordCount. Fabian 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek : > Hi, > I have a working rewrite of

Scala API rewrite almost complete

2014-09-04 Thread Aljoscha Krettek
Hi, I have a working rewrite of the Scala API here: https://github.com/aljoscha/incubator-flink/commits/scala-rework I'm hoping that I'll only have to write the tests and port the examples. Do you think it makes sense to let other people port the examples, so that someone else uses it and maybe no