Hello. I create program, collaborative filtering using Spark, but I have trouble with calculating speed.
I want to implement recommendation program using ALS (MLlib), which is another process from Spark. But access speed of MatrixFactorizationModel object on HDFS is slow, so I want to cache it, but I can't. There are 2 processes: process A: 1. Create MatrixFactorizationModel by ALS 2. Save following objects to HDFS - MatrixFactorizationModel (on RDD) - MatrixFactorizationModel#userFeatures(RDD) - MatrixFactorizationModel#productFeatures(RDD) process B: 1. Load model information saved by process A. # In process B, Master of SparkContext is set to "local" ========== // Read Model JavaRDD<MatrixFactorizationModel> modelRDD = sparkContext.objectFile("<HDFS path>"); MatrixFactorizationModel preModel = modelData.first(); // Read Model's RDD JavaRDD<Tuple2<Object, double[]>> productJavaRDD = sparkContext.objectFile("<HDFS path>"); JavaRDD<Tuple2<Object, double[]>> userJavaRDD = sparkContext.objectFile("<HDFS path>"); // Create Model MatrixFactorizationModel model = new MatrixFactorizationModel(preModel.rank(), JavaRDD.toRDD(userJavaRDD), JavaRDD.toRDD(productJavaRDD)); ========== 2. Call "predict" method of above MatrixFactorizationModel object. At number 2 of process B, it is slow speed because objects are read from HDFS every time. # I confirmed that the result of recommendation is correct. So, I tried to cache "productJavaRDD" and "userJavaRDD" as following, but there was no response from "predict" method. ========== // Read Model JavaRDD<MatrixFactorizationModel> modelRDD = sparkContext.objectFile("<HDFS path>"); MatrixFactorizationModel preModel = modelData.first(); // Read Model's RDD JavaRDD<Tuple2<Object, double[]>> productJavaRDD = sparkContext.objectFile("<HDFS path>"); JavaRDD<Tuple2<Object, double[]>> userJavaRDD = sparkContext.objectFile("<HDFS path>"); // Cache productJavaRDD.cache(); userJavaRDD.cache(); // Create Model MatrixFactorizationModel model = new MatrixFactorizationModel(preModel.rank(), JavaRDD.toRDD(userJavaRDD), JavaRDD.toRDD(productJavaRDD)); ========== I could not understand why "predict" method was frozen. Could you please help me how to cache object ? Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-cache-RDD-of-collaborative-filtering-on-MLlib-tp21962.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org