ster("replaceWithTerm", replaceWithTerm, DataTypes.StringType);
Dataset joined = sentenceDataFrame.join(sentenceDataFrame2,
callUDF("contains", sentenceDataFrame.col("sentence"),
sentenceDataFrame2.col("label2")))
.withColumn("senten
ster("replaceWithTerm", replaceWithTerm, DataTypes.StringType);
Dataset joined = sentenceDataFrame.join(sentenceDataFrame2,
callUDF("contains", sentenceDataFrame.col("sentence"),
sentenceDataFrame2.col("label2")))
.withColumn("senten
t
set ?
Code for convergence criteria:
https://github.com/apache/spark/blob/c0e9ff1588b4d9313cc6ec6e00e5c7663eb67910/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L251
Thanks,
Nishanth
Hi Xiangrui,
Thanks! for the reply, I will explore the suggested solutions.
-Nishanth
Hi Nishanth,
Just found out where you work:) We had some discussion in
https://issues.apache.org/jira/browse/SPARK-2465 . Having long IDs
will increase the communication cost, which may not worth the benefit
Yes, we are close to having more 2 billion users. In this case what is the
best way to handle this.
Thanks,
Nishanth
On Fri, Jan 9, 2015 at 9:50 PM, Xiangrui Meng wrote:
> Do you have more than 2 billion users/products? If not, you can pair
> each user/product id with an integer