Spark Bug: Counting twice with different results

2015-05-22 Thread Niklas Wilcke
Hi, I have recognized a strange behavior of spark core in combination with mllib. Running my pipeline results in a RDD. Calling count() on this RDD results in 160055. Calling count() directly afterwards results in 160044 and so on. The RDD seems to be unstable. How can that be? Do you maybe have

Re: How to run tests properly?

2014-10-30 Thread Niklas Wilcke
. Thanks, Niklas On 29.10.2014 19:01, Patrick Wendell wrote: One thing is you need to do a maven package before you run tests. The local-cluster tests depend on Spark already being packaged. - Patrick On Wed, Oct 29, 2014 at 10:02 AM, Niklas Wilcke 1wil...@informatik.uni-hamburg.de wrote

Re: How to run tests properly?

2014-10-29 Thread Niklas Wilcke
. but couldn't get the tests run without a failure. Could this be a configuration issue? On 28.10.2014 19:03, Sean Owen wrote: On Tue, Oct 28, 2014 at 6:18 PM, Niklas Wilcke 1wil...@informatik.uni-hamburg.de wrote: 1. via dev/run-tests script This script executes all tests and take several hours

How to run tests properly?

2014-10-28 Thread Niklas Wilcke
Hi, I want to contribute to the MLlib library but I can't get the tests up working. I've found three ways of running the tests on the commandline. I just want to execute the MLlib tests. 1. via dev/run-tests script This script executes all tests and take several hours to finish. Some tests

MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Spark developers, I try to implement a framework with Spark and MLlib to do duplicate detection. I'm not familiar with Spark and Scala so please be patient with me. In order to enrich the LabeledPoint class with some information I tried to extend it and added some properties. But the ML

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
](label: T, features: Vector) In my opinion making LabeledPoint abstract is necessary and introducing a generic label would be nice to have. Just to clarify my priorities. Kind Regards, Niklas Wilcke On 25.09.2014 16:02, Yu Ishikawa wrote: Hi Niklas Wilcke, As you said, it is difficult

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Egor Pahomov, thanks for your suggestions. I think I will do the dirty workaround because I don't want to maintain my own version of spark for now. Maybe I will do later when I feel ready to contribute to the project. Kind Regards, Niklas Wilcke On 25.09.2014 16:27, Egor Pahomov wrote: I