from:"Pasquinell Urbani"

Re: RMSE in ALS

2016-09-14 Thread Pasquinell Urbani

re for comparing rankings. There are > ranking metrics like mean average precision that would be appropriate > instead. > > On Wed, Sep 14, 2016 at 9:11 PM, Pasquinell Urbani < > pasquinell.urb...@exalitica.com> wrote: > >> It was a typo mistake, both are rmse. >

Re: RMSE in ALS

2016-09-14 Thread Pasquinell Urbani

x27;re on the scale of 1-5, that's extremely poor. > > What's RMS vs RMSE? > > On Wed, Sep 14, 2016 at 8:33 PM, Pasquinell Urbani > wrote: > > Hi Community > > > > I'm performing an ALS for retail product recommendation. Right now I'm > > re

RMSE in ALS

2016-09-14 Thread Pasquinell Urbani

Hi Community I'm performing an ALS for retail product recommendation. Right now I'm reaching rms_test = 2.3 and rmse_test = 32.5. Is this too much in your experience? Does the transformation of the ranking values important for having good errors? Thank you all. Pasquinell Urbani

Perform an ALS with TF-IDF output (spark 2.0)

2016-08-25 Thread Pasquinell Urbani

Hi there I am performing a product recommendation system for retail. I have been able to compute the TF-IDF of user-items data frame in spark 2.0. Now I need to transform the TF-IDF output in a data frame with columns (user_id, item_id, TF_IDF_ratings) in order to perform an ALS. But I have no cl

Re: QuantileDiscretizer not working properly with big dataframes

2016-07-12 Thread Pasquinell Urbani

, 1.0) Is there another way? 2016-07-11 18:28 GMT-04:00 Pasquinell Urbani < pasquinell.urb...@exalitica.com>: > Hi all, > > We have a dataframe with 2.5 millions of records and 13 features. We want > to perform a logistic regression with this data but first we neet to divide

QuantileDiscretizer not working properly with big dataframes

2016-07-11 Thread Pasquinell Urbani

Hi all, We have a dataframe with 2.5 millions of records and 13 features. We want to perform a logistic regression with this data but first we neet to divide each columns in discrete values using QuantileDiscretizer. This will improve the performance of the model by avoiding outliers. For small d

Iterate over columns in sql.dataframe

2016-07-08 Thread Pasquinell Urbani

Hi all I need to apply QuantileDiscretizer() over a 16 columns sql.dataframe. Which is the most efficient way to apply a function over each columns? Do I need to iterate over columns? Which is the best way to do this? Thank you all.

Change from distributed.MatrixEntry to Vector

2016-06-23 Thread Pasquinell Urbani

Hello all, I have to build a item-based recommendation system. First I obtained the similarity matrix with CosineSimilarity DIMSUM by twitter solution ( https://blog.twitter.com/2014/all-pairs-similarity-via-dimsum). The similarity matrix is in the following format: org.apache.spark.rdd.RDD[org.ap

TFIDF question

2016-05-23 Thread Pasquinell Urbani

Hi all, I'm following an TF-IDF example but I’m having some issues that i’m not sure how to fix. The input is the following val test = sc.textFile("s3n://.../test_tfidf_products.txt") test.collect.mkString("\n") which prints test: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[370] at tex

Problems finding the original objects after HashingTF()

2016-05-20 Thread Pasquinell Urbani

Hi all, I'm following an TF-IDF example but I’m having some issues that i’m not sure how to fix. The input is the following val test = sc.textFile("s3n://.../test_tfidf_products.txt") test.collect.mkString("\n") which prints test: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[370] at tex

Re: RMSE in ALS

Re: RMSE in ALS

RMSE in ALS

Perform an ALS with TF-IDF output (spark 2.0)

Re: QuantileDiscretizer not working properly with big dataframes

QuantileDiscretizer not working properly with big dataframes

Iterate over columns in sql.dataframe

Change from distributed.MatrixEntry to Vector

TFIDF question

Problems finding the original objects after HashingTF()

10 matches

Site Navigation

Mail list logo

Footer information