Re: Performance Issue using item-based approach!

Najum Ali Thu, 17 Apr 2014 03:19:34 -0700

Ok, here you go:

I have created a simple class with main-method (no server and other stuff):

public class RecommenderTest {

public static void main(String[] args) throws IOException, TasteException {

DataModel dataModel = new FileDataModel(new File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv"));

ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);

ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, similarity);

String pathToPreComputedFile = preComputeSimilarities(recommender, dataModel.getNumItems());

InputStream inputStream = new FileInputStream(new File(pathToPreComputedFile));

BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));

Collection<GenericItemSimilarity.ItemItemSimilarity> correlations = bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList());

ItemSimilarity precomputedSimilarity = new GenericItemSimilarity(correlations);

ItemBasedRecommender recommenderWithPrecomputation = new GenericItemBasedRecommender(dataModel, precomputedSimilarity);

recommend(recommender);

recommend(recommenderWithPrecomputation);

}

private static String preComputeSimilarities(ItemBasedRecommender recommender, int simItemsPerItem) throws TasteException {

String pathToAbsolutePath = "";

try {

File resultFile = new File(System.getProperty("java.io.tmpdir"), "similarities.csv");

if (resultFile.exists()) {

resultFile.delete();

}

BatchItemSimilarities batchJob = new MultithreadedBatchItemSimilarities(recommender, simItemsPerItem);

int numSimilarities = batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(), 1,

new FileSimilarItemsWriter(resultFile));

pathToAbsolutePath = resultFile.getAbsolutePath();

System.out.println("Computed " + numSimilarities + " similarities and saved them to " + pathToAbsolutePath);

} catch (IOException e) {

System.out.println("Error while writing pre computed similarities to file");

}

return pathToAbsolutePath;

}

private static void recommend(ItemBasedRecommender recommender) throws TasteException {

long start = System.nanoTime();

List<RecommendedItem> recommendations = recommender.recommend(1, 10);

long end = System.nanoTime();

System.out.println("Created recommendations in " + getCalculationTimeInMilliseconds(start, end) + " ms. Recommendations:" + recommendations);

}

private static double getCalculationTimeInMilliseconds(long start, long end) {

double calculationTime = (end - start);

return (calculationTime / 1_000_000);

}

private static Function<String, GenericItemSimilarity.ItemItemSimilarity> mapToItemItemSimilarity = (line) -> {

String[] row = line.split(",");

return new GenericItemSimilarity.ItemItemSimilarity(

Long.parseLong(row[0]), Long.parseLong(row[1]), Double.parseDouble(row[2]));

};

}

And thats the Output-log:

3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Creating FileDataModel for file /Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv

63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Reading file info...

1207 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Processed 1000000 lines

1208 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Read lines: 1000209

1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel - Processed 6040 users

1599 [main] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - Queued 3706 items in 38 batches

10928 [pool-1-thread-8] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 7 processed 5 batches

10928 [pool-1-thread-8] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 7 processed 5 batches. done.

10978 [pool-1-thread-5] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 4 processed 4 batches. done.

11589 [pool-1-thread-4] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 3 processed 5 batches

11589 [pool-1-thread-4] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 3 processed 5 batches. done.

11592 [pool-1-thread-6] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 5 processed 5 batches

11592 [pool-1-thread-6] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 5 processed 5 batches. done.

11707 [pool-1-thread-7] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 6 processed 5 batches

11707 [pool-1-thread-7] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 6 processed 5 batches. done.

11730 [pool-1-thread-3] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 2 processed 4 batches. done.

11849 [pool-1-thread-1] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 0 processed 5 batches

11849 [pool-1-thread-1] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 0 processed 5 batches. done.

11854 [pool-1-thread-2] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 1 processed 5 batches

11854 [pool-1-thread-2] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities - worker 1 processed 5 batches. done.

Computed 9174333 similarities and saved them to /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv

Created recommendations in 1683.613 ms. Recommendations:[RecommendedItem[item:3890, value:4.6771617], RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127, value:4.660716], RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123, value:4.603366], RecommendedItem[item:3233, value:4.5707765], RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, value:4.5263577], RecommendedItem[item:2343, value:4.524066]]

Created recommendations in 985.679 ms. Recommendations:[RecommendedItem[item:3530, value:5.0], RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890, value:4.6771617], RecommendedItem[item:127, value:4.660716], RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123, value:4.603366], RecommendedItem[item:3233, value:4.5707765], RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989, value:4.5263577], RecommendedItem[item:2343, value:4.524066]]

Again almost same results. Although what I also don´t understand is, why am I getting different RecommendItems?

That really frustrates me…

You can find the Java file in the attachment.

RecommenderTest.java
Description: Binary data

Greetings from Germany,

Najum

Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <s...@apache.org>:

Yes, just to make sure the problem is in the mahout code and not in the surrounding environment.

On 04/17/2014 11:43 AM, Najum Ali wrote:
@Sebastian
What do u mean with a standalone recommender? A simple offline java main program?

Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <s...@apache.org>:

Could you take the output of the precomputation, feed it into a standalone recommender and test it there?

On 04/17/2014 11:37 AM, Najum Ali wrote:
@sebastian

Are you sure that the precomputation is done only once and not in every request?
Yes, a @Bean annotated Object is in Spring per default a singleton instance.
I also just tested it out using a System.out.println()
Here is my log:

System.out.println("----> precomputation done!“ is called before returning the
GenericItemSimilarity.

The first two recommendations are Item-based -> pearson similarity
The thrid and 4th log are also item-based using pre computed similarity
The last log is the userbased recommender using pearson

Look at the huge time difference!

Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <s...@apache.org
<mailto:s...@apache.org>>:

Najum,

this is really strange, feeding an ItemBased Recommender with precomputed
similarities should give you superfast recommendations.

Are you sure that the precomputation is done only once and not in every request?

--sebastian

On 04/17/2014 11:17 AM, Najum Ali wrote:
Hi guys,

I have created a precomputed item-item-similarity collection for a
GenericItemBasedRecommender.
Using the 1M MovieLens data, my item-based recommender is only 40-50% faster
than without precomputation (like 589.5ms instead 1222.9ms).
But the user-based recommender instead is really fast, it´s like 24.2ms? How can
this happen?

Here are more details to my Implementation:

CSV File: 1M pref, 6040 Users, 3706 Items

For my Implementation I´m using screenshots, because having the good
highlighting.
My Recommender runs inside a Webserver (Jetty) using Spring 4 and Java8. I
receive Recommendations as Webservice (JSON).

For DataModel, I´m using FileDataModel.

This code below creates me a precomputed ItemSimilarity when I start the
Webserver and the property isItemPreComputationEnabled is set to true:

For time measuring I´m using AOP. I´m measuring the whole time from entering my
Controller to sending the response.
based on System.nanoTime(); and getting the diff. It´s the same time measure for
user based.

I haved tried to cache the recommender and the similarity with no big
difference. I also tried to use CandidateItemsStrategy and
MostSimilarItemsCandidateItemsStrategy, but also no performance boost.

public RecommenderBuilder createRecommenderBuilder(ItemSimilarity similarity)
throws TasteException {
final int numberOfUsers = dataModel.getNumUsers();
final int numberOfItems = dataModel.getNumItems();
CandidateItemsStrategy candidateItemsStrategy = new
SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new
SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
return model -> new GenericItemBasedRecommender(model,
similarity,candidateItemsStrategy,mostSimilarStrategy);
}

I dont know why item-based is taking so much longer then user-based. User-based
is like fast as hell. I even tried a DataSet using 100k Prefs, and 10Million
(Movielens). Everytime the user-based is soo much faster for any similarity.

Hope you anyone can help me to understand this. Maybe I´m doing something wrong.

Thanks!! :))

Re: Performance Issue using item-based approach!

Reply via email to