local web-client error

2015-07-12 Thread Michele Bertoni
I think there is a problem with the web-client Quite often I can use it for a single run and then it crash especially if after seeing the graph i click back, on the second run i get a class not found exception from terminal i have to stop and restart it and it works again Michele

bigpetstore flink : parallelizing collections

2015-07-12 Thread jay vyas
Hi flink. Im happy to announce that ive done a small bit of initial hacking on bigpetstore-flink, in order to represent what we do in spark in flink. TL;DR the main question is at the bottom! Currently, i want to generate transactions for a list of customers. The generation of transactions is a

Re: bigpetstore flink : parallelizing collections

2015-07-12 Thread Stephan Ewen
Hi Jay! You can use the "fromCollection()" or "fromElements()" method to create a DataSet or DataStream from a Java/Scala collection. That moves the data into the cluster and allows you to run parallel transformations on the elements. Make sure you set the parallelism of the operation that you wa

Re: bigpetstore flink : parallelizing collections

2015-07-12 Thread jay vyas
awesome thanks ! i ll try it out. This is part of a wave of jiras for bigtop flink integration. If your distro/packaging folks collaborate with us - it will save you time in the long run, because you can piggy back the bigtop infra for rpm/deb packaging, smoke testing, and HDFS interop testing

Re: DelimitedInputFormat reads entire buffer when splitLength is 0

2015-07-12 Thread Stephan Ewen
Hi Robert! I did some debugging and added some tests. Turns out, this is actually expected behavior. It has to do with the splitting of the records. Because creating the splits happens without knowing the contents, the split can be either in the middle of a record, or (by chance) exactly at the b

Re: TeraSort on Flink and Spark

2015-07-12 Thread Hawin Jiang
Hi Kim and Stephan Kim's report is sorting 3360GB per 1427 seconds by Flink 0.9.0. 3360 = 80*42 ((80GB/per node and 42 nodes) Based on Kim's report. The TPS is 2.35GB/sec for Flink 0.9.0 Kim was using 42 nodes for testing purposes. I found that the best Spark performance result was using

Re: TeraSort on Flink and Spark

2015-07-12 Thread Dongwon Kim
Hi Jiang, Please refer to http://sortbenchmark.org/. When you take a look at the specification of each node Spark team uses, you can easily realize that # of nodes is not the only thing to take into consideration. You miss important things to consider for a fair comparison. (1) # of disks in each

Sort Benchmark infrastructure

2015-07-12 Thread Hawin Jiang
Hi Michael and George First of all, congratulation you guys have won the sort game again. We are coming from Flink community. I am not sure if it is possible to get your test environment to test our Flink for free. we saw that Apache spark did a good job as well. We want to challenge yo