Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-13 Thread Reynold Xin
I'm going to -1 this myself: https://issues.apache.org/jira/browse/ SPARK-18856 On Thu, Dec 8, 2016 at 12:39 AM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.0. The vote is open until Sun,

Re: Document Similarity -Spark Mllib

2016-12-13 Thread Liang-Chi Hsieh
Hi Satyajit, Have you tried to adjust a higher threshold for columnSimilarities to lower the computation cost? BTW, can you also comment out most of other codes and just run columnSimilarities and do a simple computation like counting for the entries of returned CoordinateMatrix? So we can make

Belief propagation algorithm is open sourced

2016-12-13 Thread Ulanov, Alexander
Dear Spark developers and users, HPE has open sourced the implementation of the belief propagation (BP) algorithm for Apache Spark, a popular message passing algorithm for performing inference in probabilistic graphical models. It provides exact inference for graphical models without loops. Wh

Re: Document Similarity -Spark Mllib

2016-12-13 Thread satyajit vegesna
Hi Liang, The problem is that when i take a huge data set , i get a matrix size 1616160 * 1616160. PFB code, val exact = mat.columnSimilarities(0.5) val exactEntries = exact.entries.map { case MatrixEntry(i, j, u) => ((i, j), u) } case class output(label1:Long,label2:Long,score:Double) val fin

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-13 Thread Adam Roberts
I've never seen the ReplSuite test OoMing with IBM's latest SDK for Java but have always noticed this particular test failing with the following instead: java.lang.AssertionError: assertion failed: deviation too large: 0.8506807397223823, first size: 180392, second size: 333848 This particular

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-13 Thread akchin
Hello, I am seeing this error as well except during "define case class and create Dataset together with paste mode *** FAILED ***" Starts throwing OOM and GC errors after running for several minutes. - Alan Chin IBM Spark Technology Center -- View this message in context: http://apa

Re: SPARK-17455 Isotonic Regression fix languishing

2016-12-13 Thread Sean Owen
One of many things that gets lost in the shuffle -- it looks pretty straightforward so I will review today. On Tue, Dec 13, 2016 at 4:32 PM nseggert wrote: > I have PR that has been sitting untouched for months. Could someone please > take a look at it? > > https://github.com/apache/spark/pull/1

SPARK-17455 Isotonic Regression fix languishing

2016-12-13 Thread nseggert
I have PR that has been sitting untouched for months. Could someone please take a look at it? https://github.com/apache/spark/pull/15018 (SPARK-17455) in JIRA. This PR fixes problems with the way isotonic regression was implemented that caused it to take exponential time for some inputs. This pro

Output Side Effects for different chain of operations

2016-12-13 Thread Chawla,Sumit
Hi All I have a workflow with different steps in my program. Lets say these are steps A, B, C, D. Step B produces some temp files on each executor node. How can i add another step E which consumes these files? I understand the easiest choice is to copy all these temp files to any shared locatio