Synonym handling replacement issue with UDF in Apache Spark

2017-04-27 Thread Nishanth
ster("replaceWithTerm", replaceWithTerm, DataTypes.StringType);     Dataset joined = sentenceDataFrame.join(sentenceDataFrame2, callUDF("contains", sentenceDataFrame.col("sentence"), sentenceDataFrame2.col("label2")))                            .withColumn("senten

Synonym handling replacement issue with UDF in Apache Spark

2017-04-27 Thread Nishanth
ster("replaceWithTerm", replaceWithTerm, DataTypes.StringType);     Dataset joined = sentenceDataFrame.join(sentenceDataFrame2, callUDF("contains", sentenceDataFrame.col("sentence"), sentenceDataFrame2.col("label2")))                            .withColumn("senten

Stopping criteria for gradient descent

2015-09-16 Thread Nishanth P S
t set ? Code for convergence criteria: https://github.com/apache/spark/blob/c0e9ff1588b4d9313cc6ec6e00e5c7663eb67910/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L251 Thanks, Nishanth

Re: How to use BigInteger for userId and productId in collaborative Filtering?

2015-01-14 Thread Nishanth P S
Hi Xiangrui, Thanks! for the reply, I will explore the suggested solutions. -Nishanth Hi Nishanth, Just found out where you work:) We had some discussion in https://issues.apache.org/jira/browse/SPARK-2465 . Having long IDs will increase the communication cost, which may not worth the benefit

Re: How to use BigInteger for userId and productId in collaborative Filtering?

2015-01-14 Thread Nishanth P S
Yes, we are close to having more 2 billion users. In this case what is the best way to handle this. Thanks, Nishanth On Fri, Jan 9, 2015 at 9:50 PM, Xiangrui Meng wrote: > Do you have more than 2 billion users/products? If not, you can pair > each user/product id with an integer